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Abstract. The statistical inverse problem of estimating the probability distri- 
bution of an infinite-dimensional unknown given its noisy indirect observation 
is studied in the Bayesian framework. In practice, one often considers only 
finite-dimensional unknowns and investigates numerically their probabilities. 
As many unknowns are function-valued, it is of interest to know whether the 
estimated probabilities converge when the finite-dimensional approximations 
of the unknown are refined. In this work, the generalized Bayes formula is 
shown to be a powerful tool in the convergence studies. With the help of the 
generalized Bayes formula, the question of convergence of the posterior distri- 
butions is returned to the convergence of the finite-dimensional (or any other) 
approximations of the unknown. The approach allows many prior distributions 
while the restrictions are mainly for the noise model and the direct theory. 
Three modes of convergence of posterior distributions are considered - weak 
convergence, setwise convergence and convergence in variation. The conver- 
gence of conditional mean estimates is studied. Several examples of applicable 
infinite-dimensional non-Gaussian noise models are provided, including a gen- 
eralization of the Cameron-Martin formula for certain non-Gaussian measures. 
Also, the well-posedness of Bayesian statistical inverse problems is studied. 



1. Introduction 

Statistically oriented infinite-dimensional inverse problems are often described 
as problems where one wants to estimate an unknown function given its randomly 
perturbed indirect observation [18, 40, 49, 97, 112, 141]. We prefer the following 
description which suits well in the Bayesian framework. 

The statistical inverse problem is to estimate the probability distribution of the 
unknown given its randomly perturbed indirect observation. 

In this paper, the unknown X and its observation Y are modeled as random 
mappings from a complete probability space (fi, E, P) into some locally convex 
Souslin topological vector spaces F and G equipped with their Borel cr-algebras 
and G, respectively. Recall, that a Souslin space is a Hausdorff topological space 
that is an image of a complete separable metric space under a continuous mapping. 
The observations are taken to be of the form Y = L{X) + e, where e represents 
random noise, e and X are statistically independent, and L : F ^ G is a continuous 
mapping. The image measure ^ix ■= P o X~^ on F is called the prior distribution, 
and it represents our beliefs about the unknown without any given observations. 
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Typically, we are given a sample F(cjo) = L{X{ujo))+e{uJo), which is produced by 
an unknown X(u!o) and a perturbation e(wo) for some € f2. Ultimately, we pur- 
sue after the probability measure U i~> lu {X {wo)) defined on the Borel sets U C F. 
This measure would determine the unknown X(uio) uniquely since is a HausdorfF 
space, which implies that the singletons are closed sets and belong therefore to the 
Borel (T-algebra J^. We get a simple approximation of the function w n- lu{X(Ld)) 
on the basis of the given Y{u!q) by taking its orthogonal projection from L^{il, S, P) 
onto L'^{Q,(t{Y), P), where a{Y) = Y^^{Q) denotes the cr-algebra generated by Y. 
Recall, that for any / G L^{^, S, P), this projection coincides P-almost surely with 
the conditional expectation E[/|cr(y)] of / given the cr-algebra generated by Y (see 
[39]). Moreover, there exists a measurable real- valued function A/ on G such that 
Xf{Y{uj)) = E[/|cr(y)](w) P-almost surely We take E[luiX)\a{Y)]{ujo) (or more 
precisely, Xi^{Y(ujo)) as an estimate of the probability that the unknown X{ujo) 
belongs to the set U ^ F. 

When the mappings U i-> 'Ei[lu{X)\a{Y)]{uji)) form a probability measure on 
{F,F), which is denoted here with /i(t/, ^(wo)), this measure is called the posterior 
distribution of X given a sample Y{ujq) of Y . From the posterior distribution one 
may extract information about the unknown X{ljq). For example, the posterior 
mean may serve as an estimate of the unknown. 

The above estimation of the probability distribution of the unknown is generally 
known as the statistical inverse theory (also known as the statistical inversion or 
the Bayesian inversion). We postpone a literature review on the statistical inverse 
theory to Section 1.4. The present paper concentrates on the following three topics 
in the statistical inverse theory inspired by a paper of Lassas et al [96] . 

(i) Applicability of the generalized Bayes formula for statistical inverse prob- 
lems in locally convex Souslin topological vector spaces. 

(ii) Well-posedness of the Bayesian statistical inverse problem. 

(iii) Convergence of posterior distributions and posterior means for approxi- 
mated unknowns. Especially, finding conditions that guarantee the conver- 
gence of the posterior distributions when the corresponding approximated 
prior distributions converge. 

1.1. Case (i): The generalized Bayes formula. When X and Y have contin- 
uous probability densities with respect to the Lebesgue measure, the conditional 
expectations lead to the Bayes formula 

(1) D{x\y)D{y) - D{x, y) = D{y\x)D{x) 

which defines the unique continuous posterior probability density D{x\y) for any 
occurred observation y such that < Dyiy) < oo (see [75]). In (1), the functions 
D{x), D{y), and D{x, y) denote the probability densities of P o X~^ , P o F"^, and 
Po {X,Y)^^ at X, y, and (x, y), respectively. If the observation is of the form 
Y = L{X) + e, where X and the noise e are statistically independent, then the 
conditional density of Y given X = x has the special form D{y\x) = De{y — L{x)), 
where is the continuous probability density of the noise e. 

The availability of the conditional density D{ii\x) from the relationship between 
unknowns and observations is the key element for the statistical inverse theory. It 
makes the expression of the posterior density D{x\y) explicit, opening the way for 
exploring the posterior distribution numerically. Unfortunately, infinite-dimensional 
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probability measures lack probability density functions since there is no infinite- 
dimensional Lebesgue measure. Instead of (I), we have 

(2) E[E[luiX)\a(Y)]lv{Y)] ^ P{X e U HY e V) = E[lt,(^)E[ly(r)|a(X)]] 

for aU Borel sets U e F and V e G- The distributions of X, Y, and {X,Y) 
are, in principle, known. However, determining E[l(7(X)|(T(y)] explicitly from the 
first equality in (2) is in general a hard task, where an explicit expression of the 
distribution of {X, Y) is helpful, as in the case of linear Gaussian problems [100, 
102, 105]. On the other hand, the second equality in (2) looks easy enough. For 
instance, the dominated convergence of simple functions to the exponential function 
shows that 

for all (f) and tp in the dual spaces F' and G", respectively. This suggests that, after 
verifying some measurability conditions, we may take 

E[ly(y)|a(X)](w)=/i,+i(xH)(T^) 

P-almost surely since X and e are statistically independent. Does knowing the 
conditional probabilities E[ly(y)|(T(X)](cj) help in determining the posterior dis- 
tribution? The answer is positive in some cases. If the cr-algebra G in question is 
countably generated and the conditional distributions of Y given X are regular and 
P-almost surely absolutely continuous with respect to some fixed cr-finite measure 
A on G (i.e. they are dominated by A), then the generalized Bayes formula 



(3) KU,y) 



is known to hold for U d F and /iy-almost every given observation Y = y such 
that the denominator is finite and non-zero [79, 128]. In (3), it is required that the 
Radon- Nikodym densities "^^^^^^ [y] of the conditional measure ^y\x{',x) of Y 
given X = X with respect to A are jointly measurable. This is sometimes achieved 
by defining the Radon-Nikodym densities with the help of a fixed joint density as 
is done in [128]. In (3), the form of Y is allowed to be more general than in our 
restricted case of y = L{X) + e, where the posterior distribution has, for suitable 
L, X, and e, the form 



(4) KU,y) 



!/-^^^{y)dtix{x) 



for all [/ £ F and /xy-a.e. y € G such that the denominator is finite and non-zero. 
When the Radon-Nikodym densities in (4) are known, the posterior distribution on 
F has an explicit representation for all admissable y € G. 

In statistical inverse problems, the generalized Bayes formula for function-valued 
unknowns has been used before in the case of finite-dimensional noise models that 
have probability density functions with respect to the Lebesgue measure [25, 44, 
97, 137] and in the case of infinite-dimensional Gaussian noise models, using in (4) 
the Cameron-Martin formula [63, 96, 137]. The starting point in [25, 26, 137] is 
that the posterior distribution is assumed to have Radon-Nikodym density with 
respect to the prior distribution. Therefore, a similar formula like (3) is used in 
[25, 26, 137], but not derived. In [63, 96, 97], the unknown and the noise are 
statistically independent. The same seems to be the case in same examples in 
[25, 26, 137] but the fact is not emphasized. 
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Fitzpatrick [44] studied (separable) Banach-space valued unknowns, and wrote 
the expression (4) in the case of finite-dimensional observations. As a concrete 
example, he used a Gaussian prior distribution on C([0, 1]) in the ill-posed inverse 
problem of determining the function q in the differential equation —{qu')' = / 
on (0,1) from finitely many noisy values of the solution u satisfying the Dirichlet 
boundary condition. Lassas and Siltanen [97] used the generalized Bayes formula for 
certain prior random variables on C([0, 1]) and assumed the finite-dimensional noise 
to be Gaussian. Lassas et al [96] and Helin [63] had emphasis on edge-preserving 
prior distributions and used linear forward theory with Gaussian noise, but they 
allowed in (4) also other separable Banach and Hilbert space-valued unknowns, 
respectively. The forward mapping L was assumed to be linear in [63, 96, 97]. 
Cotter et al [25] studied the case of finite-dimensional observations and Banach 
space- valued unknowns, and required L to be measurable. Stuart [137] assumed 
L to be locally Lipschitz continuous and aasumed finite-dimensional or Gaussian 
noise. Stuart allowed prior distributions that are absolutely continuous with respect 
to some Gaussian measure. Theorem 4.1 in [137] is an abstract generalization 
towards allowing certain infinite-dimensional non-Gaussian noise distributions but 
the identification of the used notation to any statistical inverse problem is omitted. 
The same approach is used in [26]. 



In the present paper, we provide (abstract) assumptions on the forward the- 
ory and the noise that are sufficient for the generalized Bayes formula in the case 
of statistical inverse problems in locally convex Souslin topological vector spaces. 
However, such a generalization is not particularly novel by itself, and the general- 
ized Bayes formlula is treated in this work as an important tool for achieving other 
results. For example, the study of Case (iii) exploits the generalized Bayes formula. 

In this work, we allow infinite-dimensional noise models (similarly as in [63, 94, 
96, 116]). One may ask, what arc the benefits of such models because any feasible 
measuring instrument produces only finite-dimensional observations. For example, 
an analog-to-digital converter performs the weighted averaging and quantization of 
the signal; an X-ray imaging device has a finite number of projection angles and 
a limited resolution of the projection images. In this light, there is no immediate 
need for infinite-dimensional noise models. However, changes in the measuring 
instrument can lead to different posterior distributions and one may wish to choose 
the best finite-dimensional measurement configuration for the problem. As noted 
in [96] , the mathematical formulation of the infinite-dimensional noise model, when 
possible, may be helpful, as it provides an overall framework for the studies. For 
some noise sources there even exists physically motivated infinite-dimensional noise 
models, like the model of the thermal noise in electric circuits, which arises from 
the thermal motion of the charge carriers. 

Particular emphasis in this work is on finding tools for dealing with non-Gaussian 
noise in infinite-dimensional statistical inverse problems. There are three reasons 
why the Gaussian noise model is not satisfactory. 

(1) Noise does not always follow well enough a Gaussian distribution. In Section 
5.4 we discuss the appearance of a-stable noise in statistical inverse prob- 
lems. An evaluation of finite-dimensional noise models in medical imaging 
can be found in [57]. 

(2) Model approximations - which were studied first by Kaipio and Somersalo 
[76] for finite-dimensional observations - can also produce non-Gaussian 
errors (cf. Remark 14). 
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(3) Some noise statistics may not be exactly known. In statistical inverse theory 
the inaccuracies in the noise model are further modeled with hierarchical 
distributions (see Section 5.5 for a special case). 

A wrong noise distribution may cause poor performance of the estimators of the 
unknown. 

In Section 5, we are able to derive (with the help of the generalized Bayes for- 
mula) explicit posterior distributions in some new cases where the noise has non- 
Gaussian infinite-dimensional distribution (Sections 5.3-5.7). It turns out that in 
some cases (Section 5.4) the posterior distribution has a simple expression for the 
infinite-dimensional observations but not for the truncated finite-dimensional obser- 
vations. As a further motivation for the study of infinite-dimensional noise models, 
we suggest that the solutions of infinite-dimensional Bayesian problems may give 
rise to new numerically feasible, but non-Bayesian, approximations of the finite- 
dimensional posterior distributions. 

1.2. Case (ii): Well-posedness of the Bayesian statistical inverse problem. 

The projection operator from L^{^1, S, P) onto i^(ri, (j{Y),P) determines posterior 
probabilities /i(C/, Y{uj)) only up to P-almost every uj € Q. The uniqueness of the 
posterior distribution for a given y e Ti-iY) is therefore unsettled (note that such 
form of nonuniqucness has nothing to do with the uniqueness of the deterministic 
inverse problem of recovering xq from L{xo)). The nonuniqucness is fairly well un- 
derstood in Gaussian linear problems [94, 100, 102, 105, 133], where the posterior 
mean is known to be determined up to a set of probability zero, but has received 
limited attention in the general case. For a Bayesian scientist, such nonuniqueness 
is discomforting. Two Bayesians using the same prior distribution and the same 
observations can, in principle, have different posterior distributions for some obser- 
vations (in a set of probability zero) . One aim of the present work is to make the 
two Bayesians agree on the form of their posteriors for a given ?/ e G, at least in 
some special cases. In Theorem 2.4, we first carefully identify the nonuniqueness 
of the posterior distributions in locally convex Souslin topological vector spaces by 
adopting a new concept, the essential uniqueness, from the theory of conditional 
measures to the statistical inverse theory. Then, we apply a choice first made in 
[96] and appearing also in [25, 26, 63, 137], which is to work, if possible, with a 
fixed version of the posterior distribution depending continuously on observations 
in certain sense. Evans and Stark suggested even earlier that certain non-uniqucncss 
problems with conditional expectations could be avoided by using dominated prob- 
abilities (see Remark 3.7 in [40]). 

The original part of this work begins in Section 2.2, where we achieve partial 
uniqueness of those posterior distributions that depend continuously on observa- 
tions in the sense that posterior probabilities of Borel sets depend continuously 
on observations (cf. Theorem 2.7). The partial uniqueness gives an unambiguous 
meaning to the posterior distribution at a fixed observation. Moreover, it shows that 
then the Bayesian statistical inverse problem is well-posed - there exists a unique 
posterior distribution that depends continuously on the observations. The method 
of using continuous probability densities is widely used in the finite-dimensional 
case (see [75]), but seems not to have been taken before within the abstract infinite- 
dimensional problems. 

The posterior distributions are further studied in Theorem 2.8, where it is shown 
that the continuous dependence of the posterior probabilities of Borel sets on the 
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observations implies the absolute continuity of the posterior distribution with re- 
spect to the prior distribution. We remark that Theorem 2.8 clarifies some of the 
differences between the undominated and the ^x-a.s. dominated cases. Indeed, if 
F and G are Polish vector spaces, a result of Macci [103] says that the absolute 
continuity of /iy-almost all posterior distributions with respect to the prior distri- 
bution is equivalent to the absolute continuity of the measures fJ.e+L(x) with respect 
to the measure for pLx-&-G- x € F. Hence, the continuous dependence of the 
posterior probabilities of Borel sets on observations is possible only when the mea- 
sures fJ-e+L{x} are dominated by some cr-finite measure for ^x-a.e. x ^ F . What 
does this mean for the undominated cases? The posterior probability of at least 
one Borel set will be discontinuous as a function of observations. Hence, the cor- 
responding Bayesian problem is ill-posed ~ small perturbations of the given sample 
can lead to large perturbations of some posterior probabilities. The ill-posedness 
in the linear Gaussian statistical inverse problems has been considered before by 
Florens and Simoni [46, 133], who noted that the posterior mean in the Gaussian 
linear case can be ill-posed. Florens and Simoni also showed that the regularizing 
effect of the prior distribution has a limited power in such a case. They suggested 
using an additional Tikhonov regularization in the Gaussian linear case in order 
to obtain approximations of the posterior means that depend continuously on the 
observations. 

We note that the worst-case scenario for discontinuous posterior distributions 
on complete separable metric spaces is somewhat characterized in [16], where it is 
proved that either the set of all y € G such that the posterior distributions y) 
are mutually singular is (at most) countable or there exists a non-empty compact 
perfect set C € G and a Borel set B & F x G such that 1 = fj,{By,y) = fi{F\By, y') 
for all y,y' G C such that y ^ y' . Here By = {x £ F : {x, y) G B}. 

In Theorem 3.4, we present some sufficient conditions that guarantee continuous 
dependence of the posterior posterior probabilities of Borel sets on observations by 
using the generalized Baycs formula. Cotter et al [25, 26] have shown a closely 
related result which states that under certain conditions (including domination and 
Gaussian prior distribution) , their version of the posterior distribution is Lipschitz 
continuous in finite-dimensional observations with respect to the Bellinger distance. 
Our proof relies on the Borel measurability of separately continuous functions - a 
result first obtained by Lebesgue and later generalized by Rudin [124]. The author 
was unable to find the proof of the measurability of separately continuous Souslin 
space-valued functions so the proof is included. 

Note that even in a dominated case the posterior probabilities of Borel sets need 
not be continuous on any measurable linear subspace of full /iy-mcasure (see Section 
5.4 and Remark 12 for an example). However, the posterior probabilities of Borel 
sets in a dominated case are always continuous on certain compact sets of nearly full 
measure (cf. Theorem 2.9). Unfortunately, in infinite-dimensional normed spaces 
the interior of any compact set is empty. The partial uniqueness of Theorem 2.7 
is therefore not generic for infinite-dimensional normed spaces, unless there is a 
locally finite union of compact sets Ki such that U^j^i^i has full /iy-measure and 
the restriction of j-i{U, ■) onto each Ki is continuous, which guarantees that ^{U, ■) 
is continuous on the whole yj°°^iKi. However, in Remark 3 we note that for any 
version of the posterior distribution there always exists some stronger topology on 
G that generates the same Borel sets, but makes the version continuous. 
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1.3. Case (iii): Posterior convergence. For computational reasons, the un- 
known X is often replaced with a finite-dimensional approximation Xn, where 
Xn is an i^- valued random variable on the probability space (fi, E,P) with finite- 
dimensional range. Instead of exploring the posterior distribution of X given Y{ujo), 
we would like to explore the finite-dimensional posterior distribution of X„ given 
r„(wo) L(X„(a;o)) +e(wo), which is ^in{U ,Y^{ujq)) = E[l,7(X„)|cr(y„)](wo)- As 
noted in [94, 96], the value F„(a;o) is not given and the common procedure is to 
replace Yn{iijQ) with y = Y{u!q) in the expression of We continue to call /x„(-, y) 
the posterior distribution of X„, even though the replacement - strictly speak- 
ing - brings us out from the Bayesian world. One should note that continuity of 
the posterior distribution /i„ may additionally diminish the distortions in posterior 
opinions on X„ that are caused by replacing y„(ajo) with a close observed value 
Y{ljq). The question is then, do the posterior distributions /i„(-,y) on F (and the 
posterior means) converge when the approximations are refined? 

Positive results for the convergence of either the posterior distributions /^„(-, y) 
or /^„(-, Yn{ijj)) have been given by Fitzpatrick [44] in the case of finite-dimensional 
observations of separable Banach space- valued unknowns, Lasancn [94] in the linear 
Gaussian case, Lassas and Siltanen [97] for the total variation prior on C([0,1]), Pii- 
roinen [116] in the framework of statistical experiments for the Souslin space- valued 
random variables, Lassas et al [96] for certain Banach space- valued priors (including 
the Besov prior), Helin [63] for certain Hilbert space- valued priors (including an 
edge-preserving hierarchical prior), and Stuart [137] for a special form fndfio of the 
approximating posterior distributions, where /„ € L^(ijlq) for a Gaussian measure 
Mo- 

The convergence in [44] is proved for the posterior probabilities of sets P„(C/) 
where [/ is a Borcl sets and the approximating operators P„, where n S N are 
continuous and converge to the identity on the Banach space. Wc note that since 
image of a Borel set under a continuous mapping in a Polish space is Souslin (see 
Theorem A. 3. 15 in [12]), the class of all sets Pn{U), where U € is a subclass of 
all universally measurable sets. The convergence results in [63, 94, 96, 116] hold 
with respect to the weak convergence of measures i.e. lim„_j.oo (/) = m(/) for 
all continuous bounded real-valued functions / on F. The convergence results in 
[26, 137] are formulated for the Hellinger distance of the posterior distributions and 
the convergence results in [97] for weak convergence of the posterior distributions of 
the pointwise values of the unknown continuous function (i.e. the weak convergence 
in distribution). In [44, 97], the observations Y and the random variables Yn = 
L(Xn) + e were assumed to have continuous probability densities with respect to 
the Lcbesguc measure. The convergence results in [63, 94, 96, 97] are formulated for 
a linear forward theory L in the case of Gaussian noise. The converging posterior 
distributions in [94] are evaluated either at samples oi Y = L{X) + e oi the points 
Yn = L{Xn) + £, the converging posterior distributions in [116] are evaluated at the 
points Yn = L{Xn) + e, and the convergence results in [63, 96, 137] are formulated 
for fixed versions of the posterior distributions at given samples oi Y = L{X) + e. 

The limit of the posterior distributions may not always be what one suspects. 
The famous example is the case of the so-called finite-dimensional total variation pri- 
ors whose highly appreciated non-Gaussian posterior distributions converge weakly 
to a Gaussian distribution as the approximations are refined. Lassas and Siltanen 
[97] showed that this problem actually originates from the behavior of the prior 
distributions - the random variables X„(i) obeying total variation priors converge 
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to Gaussian limits when the disrcretizations are refined. This discovery, first con- 
jectured by Markku Lehtinen, has changed the view on how Bayesian statistical 
inverse problems should be solved for infinite-dimensional unknowns - one should 
construct an infinite-dimensional prior distribution and check that the correspond- 
ing finite-dimensional posterior distributions converge to the right limit. Otherwise 
one risks the consistency of the prior knowledge and the consistency of the poste- 
rior distributions with respect to the increase in dimensionality. This guideline is 
followed in [63, 94, 96, 116]. 

Lassas ct al [96] introduced a deterministic function on F, called the rcconstruc- 
tor Rg, that coincides a.s. with the conditional expectation of g(X) given 
where is a measurable function having values in some separable Banach space. 
Lassas et al used a clever choice of their reconstructors Ri^, U G J", which al- 
lowed them to state posterior convergence results for any given observation. The 
framework effectively transformed a question of originally probabilistic nature, the 
convergence of the conditional expectations E[lt/(X„)|F„], into a question in anal- 
ysis, the convergence of integrals. Moreover, it was possible to replace the samples 
of Yn with samples of Y in the posterior distribution. The same technique is exten- 
sively used in the present work. 

Unlike in [96], the convergence of the posterior distributions in Helin's work 
[63] is not based on approximating the prior random variables directly but on ap- 
proximating the prior probability distributions in the weak topology. We adopt 
his viewpoint, since the posterior distribution depends on the prior random vari- 
able only through its distribution, assuming that the noise and the unknown are 
statistically independent (see Theorem 2.4 and Lemma 3.2). 

Positive results for the convergence of posterior means for approximated un- 
knowns have been obtained in the linear Gaussian case [94] (a.s. convergence in 
the Schwartz space 2?'([0, 1])), for the total variation prior on C([0,1]) [97] (for 
the pointwise- values) , for exponentially integrable separable Banach space- valued 
priors [96] (in the norm topology) , for uniformly discretizcd separable Hilbert space- 
valued priors with exponential weights [63] (in the norm topology for all exponen- 
tially bounded functions), for polynomially bounded functions and those posterior 
approximations that have the form fndfj-o, where /„ € L^{^q) for some Gaussian 
measure /xq [26, 137]. 

In Section 4, some results in [25, 63, 96, 116, 137] that concern the weak conver- 
gence of posterior distributions and convergence of the posterior means are extended 
in several directions. 

Firstly, we allow prior distributions to be probability measures on a locally con- 
vex Souslin topological vector space F, whereas Lassas et al [96] and Helin [63] 
applied the generalized Bayes formula for separable Banach and Hilbert space- 
valued unknowns, respectively. Posterior distributions in locally convex Souslin 
(not mctrizable) topological vector spaces have been considered before in the spe- 
cial case of Gaussian distribution-valued random variables in [94, 100] and in the 
general abstract case of Souslin space- valued random variables only in [116]. The 
first part of Section 2 is therefore devoted to the basics of the abstract statistical 
inverse theory in locally convex Souslin topological vector spaces. Unlike in the 
work of Piiroinen [116], where general Souslin space- valued random variables were 
first studied in statistical inverse problems, we use the generalized Bayes formula 
for the proofs. We also derive the generalized Bayes formula from the equation 
Y = L{X) + £, which supplements also the formulation presented in [25, 26, 137] 
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where the starting point is a given form of the posterior distribution. This makes 
it easier to recognize situations where the generahzed Bayes formula holds. 

In this work, the use of locally convex Souslin topological vector spaces is mostly 
motivated by the fact that in Souslin spaces the Borel cr-algebras are regular enough 
for the existence of regular conditional measures [13]. Moreover, the class of such 
spaces contains many useful spaces, like complete separable metric vector spaces and 
spaces of (Schwartz) distributions [129]. Wc remark that the distribution spaces are 
sometimes preferred since the convergence of the characteristic functions of measures 
implies the weak convergence of measures for them (this fact is shown e.g. in [13] 
and used in [94]). We require G to be a topological vector space since Y is defined as 
the sum of two G- valued random variables. However, it is well-known that the sum 
of two random variables is not always a random variable in arbitrary topological 
vector spaces. In Lemma 3.1, we check that Y is indeed a random variable because 
the sample space G is a Souslin space (the fact is known but the author was unable 
to find a reference for the proof in the literature). We require G and F to be locally 
convex topological vector spaces since locally convex spaces have rich enough dual 
spaces that for example allow the use of characteristic functions in the identification 
of measures. In Remark 1, we note that a locally convex Souslin sample space of Y 
is allowed to be mis-specified by a continuous linear injection without altering the 
posterior distributions. This holds for any statistical inverse problem, not just for 
those admitting the representation (3). 

The main difference of the present Theorem 4.4 to Theorem 4.8 in [116] is that 
we do not require the conditioning cr-algebras Y~^(Q) to be increasing. This is a 
significant difference as it allows more general approximation schemes. On the other 
hand, the present approach utilizes the generalized Bayes formula, which was not 
needed in [116]. Hence, the results in [116] are valid for many noise models that 
arc not covered by our assumptions, like additive undominated noise, multiplicative 
noise, or noise that is statistically dependent on the unknown (we assume that the 
noise is statistically independent from X and all of its approximations X„). Another 
difference from Theorem 4.8 in [116] is that we work with one fixed sample y of 

Y = L{X)+e whereas in [116] it is assumed that we have a sequence {yn} consisting 
of samples of the random variables Yn = L{Xn) + s, which is a drawback when one 
considers realistic observations. The reason for this is that in [116] the convergence 
is shown for the conditional expectations E[/(X„)|tT(F„)](w) (which are equivalent 
to iJ,n{f,Yn{u!))) for P-almost every w S fl). However, in Lemma 4.15 in [116], it 
is explained how Theorem 4.8 can be used in certain cases where the observation 

Y = L{X) + e is given. Namely, the prior distribution of the Souslin space- valued 
random variable X is assumed to be concentrated on a separable Hilbert space 
H and its approximations arc of the form X„ = PnX, where the operators P„ are 
some finite-dimensional linear operators on H that converge to the identity at every 
X H. Assuming that the range of the linear operator L is some separable Hilbert 
space H and the noise e € H with probability one, Piiroinen constructed certain 
projection operators i?„, and showed that the sequence of the posterior distributions 
of Xn given Rn{LX + e) converges weakly to the posterior distribution of X given 
LX + £ as n — > oo. The result of Piiroinen shows, remarkably, that in some cases 
less data is adequate - and easier to manage - than full data. We remark that the 
required assumptions exclude injective compact linear operators of infinite rank as 
L. Indeed, \i L : H ^ H is any compact linear bijcction, then its inverse operator is 
bounded by the open mapping theorem. Considering LL^^ = I on H and L^^L = I 
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on H, we see that L can not be compact unless the Hilbert spaces H and H are 
finite-dimensional. On the other hand, compactness of L is a typical feature leading 
to the ill-posedness of the inverse problem. But under the assumptions of Lemma 
4.15 in [116], the convergence result of Piiroinen is stronger than ours in the sense 
that it allows any noise model with the property e € H with probability 1. However, 
the compactness of L is not a restriction for the present convergence results. The 
weak convergence of posterior distributions in the linear Gaussian case in [94] is not 
fully covered by the present results since some of the cases appearing in [94] are not 
/ijf-a.s. dominated. 

Secondly, we allow a wider class of noise models than the Gaussian models ap- 
plied in [63, 96]. Theorem 4.4 gives a positive answer to the weak convergence of 
the posterior distributions in locally convex Souslin topological vector spaces, when 
the translations of the noise distribution by L(x) are fix- and /xx„-a.s. dominated 
and the corresponding Radon-Nikodym densities satisfy certain measurability and 
uniform integrability conditions. Examples of suitable noise models are given in Sec- 
tion 5, which include the well-known cases of finite-dimensional noise and Gaussian 
infinite-dimensional noise but also four novel models such as spherically invariant 
noise and periodic signals in decomposable noise. 

Theorem 4.6 extends the convergence of the posterior means in [63, 96] for more 
general noise models, and relaxes slightly the integrability properties imposed on 
the approximations of the unknown in [63]. Theorem 4.6 extends also assumptions 
in Theorem 4.10 of [137] for more general approximations of the prior distribu- 
tions (but the mode of convergence is different). Some conditions, which imply the 
posterior convergence of continuous linear functionals for weakly converging prior 
distributions, are presented in Theorem 4.5. The mentioned conditions are indebted 
to the well-known criteria for the convergence of integrals with respect to measures 
that converge weakly. 

Thirdly, we consider stronger modes of convergence for posterior distributions 
than the weak convergence considered in [63, 96]. In Theorem 4.7, we give sufficient 
conditions under which the posterior distributions inherit also the setwise conver- 
gence or the convergence in variation of the approximated prior distributions. Re- 
cently, Stuart has established (see Theorem 4.6 in [137]) an estimate for the speed 
of convergence in Hcllinger distance of the posterior distributions for the approx- 
imated posterior distributions of the restricted form /i„(dx,y) = fn{x,y)dfio{x), 
where fni',y) S -^^(Mo) and fiQ is a Gaussian measure on a Banach space F. In the 
present Theorem 4.7, the approximated posterior need not be absolutely continu- 
ous with respect to a Gaussian measure, and the approximations X„ need not be 
measurable functions of X. 

Moreover, we allow the direct theory L to be nonlinear (as in the less general 
cases in [25, 26, 137]), which is a minor modification of the linear case in [63, 96], 
but indicates that nonlinearity does not necessarily complicate the mathematical 
convergence, although the exploration of the posterior distribution becomes more 
difficult. The result is not surprising since nonlinearities are frequently handled in 
the stochastic filtering problems [78, 111], which have connections to the statistical 
inverse problems. Throughout the paper, we consider continuous forward mappings 
L : F ^ G, although the existence of posterior distributions requires only their 
measurability. However, the continuity of L is utilized in the main results of the 
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present paper on the convergence of the posterior distributions and on the well- 
posedness of the Bayesian statistical inverse problem. It is also consistent with the 
usual description of the deterministic inverse problems where continuity holds. 

Unlike in [63, 94, 96, 116], the case of approximated observations is not studied 
(nor reviewed) in the present work. By focusing on the approximated unknowns, 
wc hope to single out their essential properties. Section 6 contains some examples 
of prior approximations. 

Notations: When G is a topological vector space, we denote with G' its topologi- 
cal dual space. If m is a measure on G, we sometimes denote m(/) := f{x)dm{x). 
If Z : — >■ G is a random variable, its image measure P o on G is denoted 
with [Lz- A Borel measure and its Lebcsguc's completion are denoted with the same 
symbol. 

1.4. A literature review. Statistical inverse theory became a popular method for 
solving geophysical problems in 1980's [138, 139], and has since spread into many 
other fields (see [75, 137]). In this short review, we focus on general theoretical 
developments that lead to the modern description of statistical inverse theory. A 
more problem-oriented review of infinite-dimensional Bayesian statistical inverse 
problems can be found in [137], and reviews of statistically oriented inverse problems 
can be found in [19, 141]. A good reference to the computational aspects of finite- 
dimensional statistical inverse problems is [75], to Bayesian statistics [45, 122, 128], 
and to measure theory [13]. 

The statistical background of the statistical inverse theory belongs to the field 
of nonparametric Bayesian inference. Nonparametric statistics is concerned with 
making inferences about infinite-dimensional unknowns whereas parametric statis- 
tics studies finite-dimensional unknowns [7]. The function- valued prior models in 
statistical inverse problems are therefore well within the scope of nonparametric sta- 
tistics. We briefly review Bayesian nonparametric statistics and clarify its relations 
to statistical inverse problems. 

Important nonparametric problems are the density estimation problem and the 
regression problem [106]. These two problems have guided the modern development 
of Bayesian nonparametric statistics. 

In the density estimation problem, the observations are i.i.d. samples obeying 
some unknown probability distribution that has a density function / (usually on 
R); and the objective is to estimate the density function /. This problem is not 
directly related to our statistical inverse problem but is connected to the general 
development of the research field. It should be mentioned that Wolpert et al [159, 
160] have described a semidiscrete Fredholm integral equation of the first kind as 
a Bayesian density estimation problem. On the other hand, the positron emission 
tomography (PET) imaging is an inverse problem that is usually described as a 
special density estimation problem where only indirect samples are available [72]. 
Hence, certain inverse problems lead to density estimation problems. 

In the regression problem, the observations are of the type 

yi = K{xr) -l-e^. 

where Xi € R", i ~ l,...,n, and the noise terms are typically independent 
and identically distributed. The objective is to estimate the unknown function K. 
This problem has connections to statistical inverse problems. For example, if the 
realizations of X and Y ~ L{X) + e are functions on R and L is the identity 
mapping, then X is identified as K. 
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One difference between the density estimation problem and the regression prob- 
lem is the nature of given samples. In the regression problem, a single sample can 
also be infinite-dimensional, at least in theory. When the noise is Gaussian, such 
infinite-dimensional observation models are often called white noise models (see the 
short review in [164]). 

The main questions in Baycsian nonparamctrics have been the construction of 
the prior models, the utilization of the posterior distribution, and the consistency 
of the posterior distributions. 

1.4.1. Prior models. The problem of finding good infinite-dimensional prior mod- 
els has a long history. An early application of a function- valued prior model was 
carried out in 1896 by Poincare [118], who applied a random series of the type 
X{t) = ^i^' a regression problem on [0, 1]. He assumed independent nor- 

mal distributions on coefficients Xi and calculated the posterior mean estimate on 
the basis of the given values yi = X{ti), i = 1, n. In Section V of Chapter 11 in 
[117] Poincare discussed, in his visionary manner, the noisy regression problem. He 
proposed that the smoothness of the regression curve follows from the prior infor- 
mation of the unknown curve described in the form of probability distributions. In 
1950, Grenander [58] applied a Gaussian process prior in a linear regression problem 
with additive Gaussian process noise. In 1957-58 Whittle [153, 154] discussed prior 
information on the smoothness of the unknown in certain density estimation prob- 
lems, and later Kimeldorf and Wahba [82] clarified the relations between smoothing 
and Gaussian prior models. Nowadays, regularity of functions is one of the most 
important guidelines in constructing infinite-dimensional prior models in statistical 
inverse problems. This follows from the fact that the priors in statistical inverse 
problems have two objectives. They express the prior beliefs about the unknown and 
are countermeasures against the ill-posedncss of the deterministic inverse problem. 

In general, the knowledge on infinite-dimensional random variables (and on their 
distributions) started to increase after Wiener published his construction of the 
Brownian motion in the beginning of 1920's [156]. A decade later, Kolmogorov 
[84] introduced a constructive method for defining general infinite-dimensional ran- 
dom variables in the abstract setting. His method suited well for countably many 
random variables, but Doob noticed that the constructed cr-algcbra was somewhat 
limited: for continuous parameter processes certain interesting sets, such as the set 
of all continuous functions, were not measurable with respect to the constructed 
cr-algebra. Doob's remedy was the careful definition of the continuous-parameter 
stochastic processes in 1937 [36]. The theory of stochastic processes Doob's defi- 
nition of the separable stochastic processes provides the tools but not immediate 
answers for certain questions in statistical inverse problems. Namely, can the sto- 
chastic process be interpret as a function- valued random variable that has values in 
some nice function space? The question is quite relevant since the direct theory is a 
mapping between two function spaces. One can e.g. apply Kolmogorov's continuity 
theorem (proven by Kolmogorov in 1934, see [134]). Another approach is to directly 
define probability measures on function spaces. Jessen [69] carried out integration 
on infinite-dimensional dimensional torus equipped with the coordinate-wise con- 
vergence. M. Frechct initiated the study of random variables in metric spaces (see 
[50]). His emphasis was on different modes of convergence and typical values of 
random variables, like the mean and the median. Significant contributions to the 
theory of probability measures on topological spaces were given by Alcxandrov and 
Prohorov (see [149]). Later devolopements can be found in the books of Bogachev 
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[12, 13], Gelfand and Vilenkin [53], Gihman and Skorokhod [55], Kahane [73], Kuo 
[91], Ledoux and Talagrand [98], Schwartz [129], Vakhania [147], Xia [161], and 
Yamasaki [163]. Typical points discussed in these books are the existence of mea- 
sures, invariance properties of measures, topological supports, the equivalence and 
the equality of measures, the convergence of random scries and the convergence 
of measures, all relevant properties for prior distributions. The existence of mea- 
sures is often based on the Bochner-Minlos theorem that gives conditions for the 
one-to-one correspondence between measures fx and their characteristic functions 
on certain spaces. From the point of view of statistical inverse 
problems, it is unfortunate that direct connections between the characteristic func- 
tion and the included prior information are not known. Therefore, it is no wonder 
that popular prior models have been described by other means, for example with in- 
finite product measures and random series expansions. The works of Karhunen [80] 
in 1940's on a series expansions of Gaussian random variables, nowadays known as 
the Karhunen-Loeve expansion, are in this sense important. The Karhunen-Loeve 
expansion was first used in 1950's and 1960's for expanding infinite-dimensional 
data [30, 58, 81], which made the Bayesian method of conditional mean estimation 
and the non-Bayesian method of likelihood ratio testing tractable. It was later 
adopted to describing infinite-dimensional unknowns (see for example [27]), but its 
main application has been in providing finite-dimensional approximations of Gauss- 
ian random variables. At present, other orthogonal expansions of Gaussian random 
variables are available [12]. The pioneering work of Mandelbaum [105] from 1984 on 
linear Gaussian statistical inverse problems relies on such series expansions of the 
Gaussian random variables. Other works on Gaussian priors in statistical inverse 
problems are [46, 94, 100, 102, 133]. 

In 1963, Freedman introduced the class of tail-frcc priors for the density esti- 
mation problem [51]. In 1970's the density estimation and the regression problem 
evolved further in different directions. Wahba et al [82, 150] took the approach with 
smoothing splines and Gaussian random series in the regression problem, and Fer- 
guson [43] constructed Dirichlct process priors, which arc certain random measure- 
valued unknowns, for the density estimation problem. In the case of Dirichlet 
processes, the space of the unknowns is the space of all probability measures on the 
fixed measure space equipped with the Borel cr-algebra with respect to the weak 
topology of measures. The Dirichlet process priors have similar properties in the 
density estimation problem as Gaussian priors have in the linear statistical inverse 
problems. Namely, the posterior distribution is the distribution of another Dirich- 
lct process with updated parameters. In the both cases, the calculations of the 
posterior distribution arc based on similar elements, which are the properties of the 
finite-dimensional distributions and the properties of the martingales [43, 105]. The 
Dirichlct process priors were generalized later to mixtures of Dirichlet processes (see 
[41]). Summaries of the prior distributions applied in modern density estimation 
problems can be found in [21, 151]. 

In 1990, Steinberg [135] suggested a prior model defined as a random series in 
which Hcrmite polynomials were multiplied by either improper or Gaussian coef- 
ficients. During 1998-2000, Abramovich ct al [1, 2, 3] suggested random wavelet 
expansions in Besov spaces, with hierarchical coefficients whose hyperparameters 
guaranteed the sparseness of the expansions, as priors for the regression problem. 
In 1990's also mixtures of Gaussian measures were suggested as priors for the re- 
gression problem (see [164]). Recently, Lassas ct al [96] and Hclin [63] constructed 
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non-Gaussian edge-preserving priors suitable for statistical inverse problems. Besov 
space priors introduced by Lassas et al are defined with random wavelet expansions 
and Helin's hierarchical prior distributions as mixtures of Gaussian measures. In 
2010, Stuart [137] applied prior distributions of the type f{x)fi{dx), where / G L^{^^) 
and /X is a Gaussian measure. In the abstract setting, the statistical inverse theory 
was applied for unknowns described as Souslin space-valued random variables in 
[116]. 

It should be mentioned that some combinations of prior informations do not 
have faithful probabilistic descriptions. In 1987, Backus [5] pointed out that hard 
constraints, such as the boundedness of an infinite-dimensional random variable X 
in norm, can lead to troubles if one assumes also isotropy. A well-known example 
is a Gaussian random variable X that is invariant with respect to rotations (e.g. 
orthogonal transformations) on an infinite-dimensional separable Hilbert space H 
but satisfies |lX||^f = oo with probability one [12]. 

1.4.2. Utilization of the posterior distribution. We first look at the history of infinite- 
dimensional posterior distributions in nonparametric statistics and in statistical 
inverse problems. 

In Bayesian nonparamctrics, the both problems, the density estimation and the 
regression problem, are solved with conditional probability measures. This part 
of the solution mechanism is exactly the same as in statistical inverse theory. In 
1930's, the rigorous definition of the conditional expectation by Kolmogorov [84] 
made it possible to define conditional probability measures in the abstract infinite- 
dimensional setting but it was soon noted that such conditioning did not always 
produce a probability measure. The works of Doob [37] and Dieudonne [34, 35] lead 
to the definition of a regular conditional probability, which is a random probability 
measure with probability one. The existence of regular versions of all conditional 
probabilities was verified by applying certain properties of the space of the unknowns 
in the works of Rohlin [123], Jifina [70, 71], and Sazonov [127]. Nowadays, one either 
checks the properties of the space of the unknowns (as in [116]) or checks always 
the regularity of the acquired conditional measure for the chosen prior distribution 
(as in [43]). The former is used in theoretical studies for avoiding pathological cases 
[63, 94, 96, 116] while the latter is convenient in practical solutions where a fixed 
version is needed [25, 137]. We remark that the non-existence of a regular version 
is known only for some conditional measures in exemplifying cases (see [13, 120]). 

A major step for the statistical inference for stochastic processes was the emer- 
gence of the so-called filtering problems in 1940's by Wiener [155], Kolmogorov [85] 
and Krein [87, 88]. Especially, Wiener's [155] straightforward method of solution 
(by ergodicity and least squares estimation) encouraged others to take later further 
steps towards the Bayesian nonparametric approach [48, 58, 154]. A good review 
on developments in the filtering theory is [74]. An interesting work in the filtering 
theory is [9], where it is shown that the solution of the filtering problem depends 
continuously on the distribution of the unknown. A nice collection of nonlinear 
filtering problems with Gaussian noise can be found in [104]. 

The first deliberate unions of inverse problems and Bayesian statistics were seen 
in 1960's in the form of statistical regularization i.e. minimum mean squared error 
estimation for the Gaussian linear inverse problem 
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with the finite-dimensional unknown X and the finite-dimensional observation Y . 
That is, one pursues after the estimator X{Y) that minimizes E[||X — (i.e. 
the conditional mean). In 1961, motivated by Wiener's filtering theory, Foster 
[48] presented a solution to the estimation problem (5). Other motivation for the 
Gaussian approach arose from the regularization method of Philips [115] general- 
ized later by Twomey for Fredholm integral equations of the first kind [144] and 
from the Tikhonov regularization method. During 1967-71 Turchin et al (see [143] 
and references therein), independently with Strand and Westwater [136], replaced 
the regularization method by a statistical framework that utilized a Gaussian prior 
distribution. The approach lead to Franklin's infinite-dimensional description [49] 
of the minimum mean squared error estimator of a Hilbcrt space-valued Gaussian 
unknown whose linear observations were corrupted by an additive Gaussian white 
noise. The connection between [49] and regularization methods in reproducing 
Hilbert spaces were studied by Prenter and Vogel [119]. The first work that con- 
tained the existence of regular conditional probabilities and an explicit formula for 
the posterior distribution in a linear infinite-dimensional inverse problem was the 
seminal paper of Mandelbaum [105] on Hilbert space- valued Gaussian random vari- 
ables. The value of the result for inverse problems was first recognized by Lehtinen 
et al [100] who generalized it for the Gaussian (Schwartz) distribution-valued ran- 
dom variables. This work of Lehtinen et al can be considered as the starting point 
of the infinite-dimensional Bayesian inverse problems. The case of Banach space- 
valued Gaussian random variables was later considered by Luschgy [102]. In these 
works, the expression of the posterior mean is obtained by using the equivalence 
between statistical independence of Gaussian random variables and their orthogo- 
nality in L^{P). The key factor is the orthogonal random series expansion of the 
Gaussian observation - a method used by Grenander [58], and even by Poincare 
[118]. Cox [27] applied Gaussian separable Banach space- valued unknowns in a 
linear regression problem with additive Gaussian noise. The approach of Cox dif- 
fers from that of Mandelbaum since it uses the generalized Bayes formula rather 
than the special properties of Gaussian random variables (see Proposition 2.1 in 
[27]). An abstract formulation of Bayesian statistical inverse problems for Souslin 
space-valued random variables was given by Piiroinen [116], who only required the 
observation and the unknown to be Souslin space- valued random variables, thus 
allowing nonlinear direct problems and more complicated noise terms. 

Little is known about the form of posterior distributions in infinite-dimensional 
statistical inverse problems outside the Gaussian linear case [94, 100, 102, 105, 133] 
and the dominated case with Gaussian noise [63, 96, 164]. When F and G are 
complete separable metric spaces, a result of Macci [103] tells that the Lebesgue 
decomposition of the posterior distribution with respect to the prior distribution 
contains a nontrivial singular part in undominated cases. Namely, the Lebesgue 
decomposition of the posterior distribution with respect to the prior distribution 
is of the form yu(-,2/) = /i^"'^^(-,y) -I- /i^^^(-,y), where the absolutely continuous part 
^(ac)^.^ y) is determined by the absolutely continuous part of i-iY\xi'j with respect 
to fly through the equations 




where U £ J-. Moreover, the singular part /x'*' (•,?/) is determined by the singular 
part of fiY\xi'Tx) with respect to fiy through the equations 
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from which one chooses a regular version. We remark that in such undominated 
cases, one may expect to meet some surprises. The posterior distribution presents 
then some things that seemed to be a priori impossible. 

The extraction of information from the posterior distribution involves decision 
theory, including point estimation and hypothesis testing (for the general descrip- 
tion of the Bayesian decision theory, see [128]). A decision theoretic view towards 
Bayesian inversion is given in [40] , where one performs the estimation of the (sep- 
arable Banach space-valued) unknown by first fixing the prior distribution of the 
unknown X and choosing a loss function I : F x F ^ 'R. that penalizes the inac- 
curacies in the estimates of the unknown, and then choosing the so-called Bayes 
estimator X : G ^ F, which is a deterministic function that gives the smallest av- 
eraged loss 'E\t{X{Y),X)]. This is equivalent to taking as each X{Y{uj)) the value 
d that minimizes the posterior expected loss E[£(d, X)|(T(F)](a;). 

Common point estimators in finite-dimensional statistical inverse problems are 
the maximum a posteriori (MAP) estimator and the conditional mean (CM) es- 
timator (i.e. the posterior mean) [75]. The CM estimator X{Y) = E[X|cr(y)] 
minimizes the posterior risk for the squared error loss function i{x' ,x) = \x' — a;|^ 
(when X : Vl ^ R" is suitably integrable) [75]. Conditional means have ap- 
peared also in the framework of infinite-dimensional statistical inverse problems 
[63, 94, 96, 97, 100, 102, 105]. However, the decision-theoretic justification is often 
neglected, and the conditional mean is reported just as a typical value of the pos- 
terior distribution. Other notions of typical values for distributions on separable 
metric spaces were considered by Frechet [50]. 

The mean of a locally convex Hausdorff topological vector space- valued random 
variable can arise from different definitions, depending on the space in question. 
In general, the (weak) mean of a locally convex Hausdorff topological vector space- 
valued random variable X is a vector m S F" (or more generally, m in the algebraic 
dual space of F') such that {m,4)) pn p, = ^^[{X,^)) p^pi] for all cj) G F' P-a.s. (see 
[12]). Such notion of vector- valued integration was developed by Pettis [114] in 
1933 for reflexive separable Banach spaces F. Gelfand used a similar definition 
for distribution- valued random variables (see [53]). The Pettis-Gelfand integral 
was generalized for quasi-complete Souslin space- valued functions by Thomas [142]. 
For Banach-space valued random variables having integrable norm, a mean can be 
defined also as the Bochner integral m = Jp xdfixix), introduced in early 1930's by 
Bochner (see [33] and references therein). 

When the posterior distribution n{-,y) is known for a given sample y of Y, 
the (weak) conditional mean m G F'' is a vector that satisfies {m,(j)) pn ^pi = 
J {x,(l))p^p'^{dx,y) for all £ F' . When is a separable refiexive Banach space 
and \\X\\ is integrable, the same posterior mean can also be defined as the Bochner 
integral (see Proposition V.2.5 in [108]). 

We remark that the weak posterior mean £'[X|(T(y)] is a Bayes estimator in a 
weak sense i.e. it gives the smallest averaged loss for the family of loss functions 
t^{x,x') = \{x — a;',</))|^, where (j) € F' . Franklin [49] used such requirement, when 
he defined the best linear estimator in a Gaussian linear inverse problem in 1970. 
An earlier approach to the best linear estimator in function-valued Gaussian case 
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was given by Grenander in 1950 (see Chapter 6 in [58]). He considered a Gaussian 
linear regression problem and identified the best linear estimator (with respect to 
the pointwise squared error loss i{X{t), X{t)) = \X{t) — X{t)\'^) with the posterior 
mean. Grenander used infinite-dimensional observations, but he made simultaneous 
inferences on only finitely many values of the unknown function. Moreover, he 
required, but not proved, the regularity of the conditional probabilities. In this 
sense, his approach to the posterior means was still far from the description of 
Mandelbaum from 1984 [105]. Remark, that the technique of estimating the value of 
X[t) on the basis of infinitely many observations is still the standard in the modern 
filtering theory [111]. Also in the Bayesian density estimation, the estimation is 
sometimes carried out in either in the form X(t) = E[X{t)\Yi, ...,Yn], where X is 
the unknown probability density function on, say [0,1] (see [154]), or in the form 
X{U) ^ E[X(;7)|ri,...,y„], where the sets U C [0,1] are Borel set and X is an 
unknown random probability measure (sec [106] and Proposition 4.2.1 in [54]). The 
density estimator X is a Bayes estimator with respect to the squared error loss 
function for each t or for each Borel set U, respectively. An other option is to 
use a weighted L^-loss function £{X,X) = \X{t) - X{t)\'^dw{t) [43]. The two 
estimators coincide when X is suitably integrable. 

In the works of Mandelbaum [105] and Luschgy [102], the space F is a Hilbert or 
Banach space, and the posterior mean is defined as a Bochner integral. However, the 
emphasis is on the Gaussian nature of the prior, and the posterior mean is calculated 
as ^i^iWO^)] = Si^i E[Xi|(T(y)]ei. Similar approach appears in [94, 

100] for the distribution space, where the posterior mean is defined in the weak sense. 
The weak definition of the posterior mean is used also in [97] for the space C([0, 1|). 
In [63, 96], the conditional mean of a separable Banach space- valued random variable 
is defined as a Bochner integral with respect to the posterior distribution. Before 
Luschgy, Krug [89] dctcrminded the posterior mean of a separable Banach space- 
valued Gaussian unknown in a linear Gaussian case, but he assumed that the given 
observation was finite-dimensional. 

We remark that when F is a Hilbert space, one can take £{x',x) = ||.t — x'Wj^ 
as the loss function that gives the CM estimator. As in the finite-dimensional case, 
the main point is that 

E[\\XiY) X\\l] = n\\X{Y) ElXHYMl] + 
E[(X(r) - E[X\a{Y)],E[X\<j{Y)] - X)^] + E[||X|||,], 

and the additional difficulty is just in checking that X)\a{Y)] = {f,'E[X\a{Y)]). 
Such loss functions have been used in the regression problem for the Gaussian mix- 
ture priors when F = L^([— 1, 1]) [164]. Instead of an L^-loss function, Abramovich 
et al [1] used an L^-loss function in a regression problem for a discrctizcd Bcsov 
space- valued unknown. We note that a common approach in the regression problem 
is to present only the Bayes estimates instead of the whole posterior distribution. 

Luschgy [102] made an (unproven) remark that for Gaussian posterior distribu- 
tions the conditional mean is the Bayes estimator for every symmetric quasi-convex 
(measurable) loss function £{x, x') = £{x — x'). A proof can be found in [15], where 
it is derived from the Anderson property of Gaussian measures (for the property, 
see [101]). 

In finite-dimensional spaces, the MAP estimator can be interpreted as a limit 
of Bayes estimators for the 0-1-valued losses £e{x',x) ~ ^py-sfx e)i^')^ where e — )• 
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[122]. Here B{x,e) is the closed ball in F that is centered at x and has radius e. 
Lassas and Siltanen [97] showed that MAP estimates can behave inconsistently as 
dimensionality of the unknown increases, even though the posterior distributions 
converge at the same time. In their example, the MAP estimates actually vanish 
at the limit, regardless of the given observation. Similar result is proved in [64] for 
a hierarchical edge-preserving prior. The MAP and CM estimates coincide for the 
finite-dimensional Gaussian priors, and numerical results demonstrate that they can 
practically coincide for the finite-dimensional approximations of Besov-priors [83]. 
Cotter et al [25] discussed MAP estimation in the context of infinite-dimensional 
Bayesian problems. They showed that there exists a minimizer for a penalized 
log-likelihood function, which has similar form as in the case of finite-dimensional 
Gaussian unknown. However, the conditions that would relate the penalized log- 
likelihood function to any posterior density were omitted in [25] , which leaves open 
the question what connections the minimizer has to the infinite-dimensional pos- 
terior distributions. Recalling the result of Lassas and Siltanen [97] arises at least 
some caution. Another attempt towards MAP estimation with infinite-dimensional 
Gaussian priors is given by Hegland [62] . Unfortunately, the proof of Proposition 1 
in [62] is not rigorous, as it involves subtraction of two numbers that are infinitely 
large with probability 1 (i.e. the Cameron-Martin norms of arbitrary vectors in the 
space of the unknowns). 

In infinite-dimensional statistical inverse problems the hypothesis testing has 
been largely neglected, although several interesting question could be raised. For 
example, Fitzpatrick [44] has made an initiative on testing if the evidence supports 
the homogeneity of the unknown difl[usion coefficient. Hypothesis testing was pro- 
posed also for some nonparametric statistical inverse problems in [11] within the 
classical framework. However, it was pointed out in [11] that the problems can be 
similarly handled also by the (finite-dimensional) Bayesian methods but this remark 
is not elaborated further. 

Another approach to exploiting posterior distributions was given by Piiroinen 
[116]. He interpret the posterior distributions as statistical measurements, which 
allowed comparisons of information contents of different posterior distributions. The 
result is especially useful in experimental design [99]. 

1.4.3. Posterior consistency. The consistency of the posterior distributions (with 
respect to repeated independent observations) is closely connected to the unique- 
ness of the deterministic inverse problem of determining x from y = L{x). The 
pioneering work of Doob [38] on martingales touched the question of consistency of 
the posterior distributions. Doob's results imply that under model identifiability 
(i.e. the measures He+L(x) ^tre different for different x € F) the posterior distribu- 
tions would concentrate (in the weak topology of measures) on the true unknown xq 
/ii(a;o)_i_g-almost surely for xq when infinitely many i.i.d. observations would 

be available. The consistency of the posterior distribution is an important topic be- 
cause it shows that enough data will guide a Bayesian scientist almost surely to the 
true answer. The words /ix-a.s. made Doob's approach slightly impractical as they 
left open the frequcntist case where the observations are not samples of L{X) + e but 
samples of L{x)+e for some fixed x. Freedman [51] demonstrated that inconsistency 
could hold on topologically large sets. The problem was approached by Schwartz 
[130] who described a set of unknowns x for which consistency holds (a;) -almost 
everywhere under some decision theoretic conditions and domination (i.e. all mea- 
sures {/ie+L(2:) : X € F} are assumed to be absolutely continuous with respect to 
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some common cr-finite measure). The required property is the positive prior prob- 
abihty of all KuUback-Leibler neighborhoods of the unknown x. Consistency has 
been studied also in other topologies, beside of the weak topology. Barron et al [6] 
proved consistency in Hellinger distance. Summaries of consistency result in density 
estimation can be found in [31, 32, 162]. The case of Gaussian regression has been 
studied in [148], where certain probabilities are shown to converge. 

Consistency issues in inverse problems are discussed in [40]. For our statisti- 
cal inverse problem, the consistency corresponds to observing one sample oi Y = 
L{X) + -^£, where n represents the number of i.i.d. observations of L{x) + e. The 
works of Hofinger and Pikkarainen [67, 68], and Neubauer and Pikkarainen [107] 
on finite-dimensional Gaussian statistical inverse problems concern the question of 
posterior consistency. They studied the convergence of posterior distributions and 
the posterior means in linear Gaussian inverse problems for finite-dimensional ran- 
dom variables as the variance of the noise decreases. In particular, it was shown 
in [67] that the posterior distributions given observed values Yg^ = Lx + 
of Ys^ = LX -\- Sn£ for a sequence (S„ — >■ 0, converge to the point mass on the 
true value x in the Ky Fan metric, assuming that also the prior distribution are 
modified appropriately. Hofinger and Pikkarainen [107] studied posterior conver- 
gence rates for finite-dimensional approximations of Hilbert-space- valued random 
variables when the approximation level increases in certain manner as the noise 
level Sn approaches to zero. However, the convergence was shown only for un- 
knowns in an a priori zero measurable set (the Cameron-Martin space of the prior 
distribution). Also Florens and Simoni [46] studied the posterior consistency for 
the infinite-dimensional linear Gaussian inverse problems when the variance of the 
noise diminishes. They were able to show the posterior consistency if the posterior 
measures with respect to the weak topology (and give estimates for the speed of 
convergence of the posterior means) by assuming that the direct theory is regular 
enough and the prior distribution depends suitably on the noise level. 

Another convergence topic that has received more attention in statistical inverse 
problems is the posterior convergence for approximated unknowns and observations 
[63, 94, 96, 97, 116, 137]. This case has been discussed above in the introduction. 

Almost all known convergence results for posterior distributions [4, 63, 94, 95, 
96, 97, 137] are based on the known form of the posterior distribution. There are 
also some measure-theoretic approaches for convergence of conditional expectations. 
The results of Ganssler and Pfanzagl [52] showed that the conditional expectations 
i?[l(7(A'„)]y] converge when the joint distributions of the observation and the ap- 
proximated unknowns are dominated by some tr-finite measure and the correspond- 
ing Radon-Nikodym densities converge almost everywhere. Furthermore, they also 
showed that there exists a regular version for which the convergence holds almost 
surely in variation. Here one should pay attention to the fact that the conditioning 
cr-algebra does not depend on n which is not satisfactory from the point of view of 
numerical solutions of statistical inverse problems. Landers et al [93] generalized 
this result for monotonic sequences of conditioning cr-algebras (T(y„). It should be 
noted that in statistical inverse problems, the cr-algebras a{L{Xn) + s) are usually 
not increasing. Krikkeberg [86] proved a martingale type convergence theorem for 
not necessarily monotonic cr-algebras, but his conditions seem to be too abstract for 
the statistical inverse problems in the present form. A reformulation of his condi- 
tions in terms of random variables (X„, F„) would give valuable information on the 
almost sure posterior convergence in the general undominated case. Goggin [56] and 
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Crimaldi et al [28, 29] studied conditions under which the convergence of (X„,y„) 
to {X,Y) impHes the convergence of the conditional expectations E[/(X„)|y„] to 
E[/(X)|y] in distribution or in probabihty. Their results are not satisfactory for 
statistical inverse problems, since they do not say anything about the almost sure 
convergence of posterior distributions for fixed samples of Yn, but the necessary 
conditions in [29] are valid for also a.s. convergence. For example, the results in 
[29] imply that setwise convergence of the prior distributions is necessary for the 
setwise convergence of the posterior distributions (given samples of 1^). We re- 
mark that samples of Y, not F„, are usually given. For the undominated case, a 
result of Berti et al [8] somewhat simplifies the study of posterior convergence. Un- 
der quite general conditions, their result reduces the problem of almost sure weak 
convergence of random measures to the study of only countably many sequences 
of conditional expectations. Piiroinen gave a sufficient condition that guarantees 
the convergence of posterior distributions when unknowns and observations are ap- 
proximated [116]. The emphasis in his results was on obtaining with probability 
1 the posterior convergence for the Souslin space-valued approximated unknowns 
given samples of multi-indexed observations of the corresponding approximated un- 
knowns. His proof relies on improving the a.s. convergence of the conditional 
expectations of each function of the type f{Xn)g{X), where / and g are continuous 
and bounded, to almost sure weak convergence of posterior distributions. 

A concept close to the posterior convergence is the so-called discretization invari- 
ance, which was first used by Markku Lehtinen in 1990's (see [96]). It asks that the 
prior knowledge is consistent at all discretization levels and aims to the stability 
of posterior knowledge on different discretization levels. Definitions for discretiza- 
tion invariance in statistical inverse problems are given in [96, 97]. In [96], Lassas 
et al defined a proper linear discretization X„ = PnX of a Banach space-valued 
random variables X, where P„ are bounded linear operators on the Banach space 
F having finite-dimensional ranges and the random variables {PnX, (p) converge in 
distribution to {X,(j)) for all (j) € F' . Gaussian priors and Bcsov space priors were 
shown to be discretization invariant in [96] in the sense that they have proper linear 
discretization for which the conditional mean estimates converge. An important 
example was studied by Lassas and Siltanen [97] who showed that the finite dimen- 
sional total variation priors converge to a Gaussian measure and the corresponding 
CM estimates converge to the CM estimate obtained with a Gaussian prior. The 
total variation priors are not discretization invariant as the finite-dimensional prior 
distributions lead to unwanted effects. A special method for obtaining stable pos- 
terior knowledge was suggested by Kaipio and Somcrsalo [76], who proposed the 
approximation error approach for statistical inverse problems. In approximation 
error approach, the conditioning random variable Y = L{X) + e is written as 
Y = L{Xn) + {L{X) — L{Xn)) + s, where L(X) — L{Xn) is taken to be an additional 
noise term e. For example, if X is Gaussian and Xn = PnX , where Pn are linear pro- 
jection operators, the CM estimators take a consistent form E[X„|F] = P„E[X|F]. 
The problem becomes computationally more tractable if Xn and e are statistically 
independent in which case only the distribution of e needs to be additionally de- 
termined. This condition is often forced on e together with a numerically feasible 
approximated distribution [76, 140]. 
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2. Conditional probabilities and posterior distributions 

2.1. Solution of the statistical inverse problem. Wc define what we exactly 
mean by a statistical inverse problem and its solution. We begin by recalling the 
definition of the conditional expectation. 

The conditional expectation of / G L^(n, S,P) given a sub-a-algebra So C S is 
a So-measurable function E[/|So] such that 

f fdP= f E[/|So]dP 

J A J A 

for all A S Sg. Conditional expectations exist due to the Radon-Nikodym theorem 
as the densities of the (signed) measure fdP with respect to the measure P on 
So, but they are only defined up to sets iV G So of P- measure zero. Wc denote 

n-\Y] = n-\Y-Hg)]- 

Definition 2.1. Let (il,S,P) be a complete probability space. Let F and G be 
two Souslin spaces equipped with their Borel cr- algebras J- and G, respectively. 
Let X : ft ^ F and Y : il ^ G he measurable mappings. We call a mapping 
fi : J- X G ^ [0,1] o. solution of the statistical inverse problem of estimating the 
distribution of the unknown X given the observation Y if 

(1) fi{U,Y{io)) = E[lu{X)\Y]{uj) P-almost surely for every U e T, 

(2) y i—> y) is /^y-measurable for every U £ and 

(3) U ^ fJ-{U, y) is a probability measure on (P, F) for every y € G. 

The distributions fi{-,y) are called posterior distributions of X givenK = y. 

Strictly speaking, the posterior distributions are defined a posteriori of the obser- 
vation Y[ijj) but we feel that there is no harm in calling y) posterior distributions 
also for y ^ R{Y) since /^y(G) = 1. 

The solution is just a regular conditional distribution of X given the sub-a- 
algebra Y~-^{Q), where the regularity holds in the sense of Doob i.e. the solution 
fi is /iy-measurable in the second variable (see Remark 10.6.3 in [13] for a further 
discussion). The nature of the mapping u i— > fj.(U,Y(Lo)), which need not be cr(y)- 
measurable, is verified in the following simple lemma. 

Lemma 2.2. Let {G,G) be a measurable space. Let Y : ^ G be a measurable 
mapping from a complete probability space (57, S,P) into G. Lf f : G ^ H is a iiy- 
measurable function then f{Y) is a random variable on (r2,S,P), and E[/(y)] ~ 
J f{y)dlJ'Y{y)- Moreover, if f : G ^ II is a Borel measurable function such that 
f = f HY-a.s., then f(Y{uj)) ~ f{Y{Lu)) P-almost surely and E[/(y)|So](w) = 
E[/(y)|So](ci;) P-almost surely for any sub-cr -algebra Sq C S. 

Proof. Every /iy-mcasurablc function has a Borel measurable version (see Propo- 
sition 2.1.11 in [13]). Denote with / a Borel measurable version of /. The set 
N = {y € G : f{y) 7^ /(y)} G G^^ is then /xy-zero measurable and, by definition, 
there exists a /xy-zero measurable Borel set B € Q such that N C B. Especially, 
Y-^{N) C r-i(P), which has P-measure P{Y-^{B)) = /^y(P) = so that also 
Y^^{N) belongs to the complete cr-algebra S and has P-measure zero. Therefore, 
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P-almost surely. By the completeness of S, also the mappmg uj f{Y{uj)) is 
S-measurable. By the almost sure equivalence of the functions, we get 

E[/(rH)] = E[/(yH)]= J }{y)diJiY{y) = j fiy)dMy)- 

The conditional expectations of equivalent random variables coincide, since they 
have the same integrals over Eg-measurablc sets. □ 

From the point of view of the posterior analysis, Condition 3 of Definition 2.1 
may give a false sense of security. Any fi that satisfies Conditions 1 and 2 but is 
a probability measure only for /^y-a.e. y can be redefined on a negligible set in 
such a way that it is a solution. For example, if N is the hy-'zbto measurable set 
that contains all y's for which n{-,y) is not a probability measure, we may redefine 
fi{U, y) as lu{xo) for some fixed xq £ F and all y £ N. Then satisfies Conditions 
1, 2 and 3, but y) is not related to the unknown when y £ N. 

We briefly compare the solution fi with other formulations of Bayesian inverse 
problems. Clearly, any regular conditional distribution fi oi X given Y (such that 
y I— >■ IJ,{U, y) is Borel-measurable for any U € qualifies as a solution. Especially, 
posterior distributions obtained by the Bayes formula (1) on R" for positive con- 
tinuous probability densities form a solution of the form iJ,{U,y) ~ Jjj D{x\y)dx 
[75]. The Gaussian conditional probabilities in [94, 100, 102, 105] are also solutions 
that are allowed to be ^y-measurable in the sense of Condition 2. Our approach is 
similar to the work of Piiroinen [116], where a general formulation of the statistical 
inverse problem for Souslin space-valued random variables first appeared. The dif- 
ference is that Piiroinen chose the posterior probabilities fJ,{U, y) to be universally 
measurable with respect to the second variable, that is, m-measurable for any finite 
Radon measure m on {G,G) whereas we prefer to take all /iy-measurable versions 
as solutions, since it helps to avoid the somewhat artificial modifications of /^im- 
measurable functions (encountered for example in the Gaussian case [100]) to any 
universally measurable or ^-measurable functions. Lassas et al [96] used a different 
approach where the posterior distribution was obtained by defining reconstructors. 
A mapping y i-^ TZ{g\y) is called a reconstructor oi g £ L^{fix) (more generally, a 
Bochner integrable g) given the observation Y if TZ{g,Y{uj)) = E[g{X)\Y~^ {Q)]{ll!) 
almost surely [96] . The concept of a reconstructor is more elemental than our solu- 
tion. However, the reconstructors that were used for solving the statistical inverse 
problem in [63, 96] were chosen to be more regular. They depend continuously 
on observations and satisfy also Conditions 1 and 3. Hence, they form a regular 
conditional distribution and are especially solutions in the sense of Definition 2.1. 
A common point of the reconstructor and our solution is that both are defined 
for all y G G, not only for samples Y{uj) G R(Y). However, the simplicity of the 
reconstructors comes with some disadvantages. Namely, if the reconstructor of the 
unknown X does not originate from a regular conditional distribution, some power 
of the Bayesian inference is lost, as there is no posterior probability distribution to 
draw from. Furthermore, two reconstructors TZi and TZ2 of the same function / may 
differ on a "large" set Y{N) C G, where N = {uj : 7^l(/, Y{uj)) ^ 7^2(/, Y{u:))} e S 
has probability zero. The set Y(N) might not belong to Q^^ and Y(N) may have 
positive /^y-outer measure. Indeed, wc provide a simple example of this situation 
with the help of the so-called image measure catastrophe (see p. 30 in [129]). Let 
V C [0, 1] be a nonmeasurable set such that the Lebesgue outer measure m*{U) = 1. 
Let ([/, Bu{[0, ^]),nAu) be the restriction of the Lebesgue measure m on U i.e. the 
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Borel cr-algebra Bu{[0, 1]) contains all sets BnU, where B G B{[0, 1]), and for such 
sets muiB Ci U) = m{B D U), where U G i3([0,l]) is such that m*(U) = m(U), 
say U = [0,1]. Take (1^,E,P) to be the completion of {U, Bu{[0,l]),m\u) and 
F = G = [0,1] equipped with its Borel tr-algebra. Let F : [/ — > [0, 1] be the identity 
and take X = Y. Then the image measure fxy is the Lebesgue measure on [0, 1]. 
Moreover, the conditional expectation of the measurable function w i— >■ l[o,i](X(w)) 
given <7{Y) is 

E[l[o4](X)|a(r)]H = l[o,i](r(w)) 
which is equal to 1u{Y{uj)) P-almost surely. However, the reconstructor 7?.i(l[o,i], •) : 
y ^ ^u{y) is not ^y-measurable on ([0, 1], S([0, 1])), and the two reconstructors 
7?.i(l[o,i], •) and 7?,2(l[o,i]' ') : V '-^ l[04](y) differ on the set Nq := [0, l]\f7 which 
has positive /iy-outer measure. Condition 2 helps us to avoid this small shortcom- 
ing. If the reconstructors are /iy-measurable, the set No = {y E G : TZi{f,-) ^ 
7^2(/,•)} e and {io : ni{f,Y{Lu)) ^ 7^2(/, r(c^))} = r-i(iVo). Then iVo has 
zero /xy -measure. 

A regular conditional distribution is not unique in general because of the non- 
uniqueness of the conditional expectations. For our theoretical considerations, the 
following concept (adapted from [13] in context of regular conditional measures) is 
useful. 

Definition 2.3. Wc say that a solution jj, of the statistical inverse problem of 
estimating the distribution of X given the observation Y is essentially unique if 
for any other solution fl of the same statistical inverse problem there exists a set 
G ~ G{fj., fl) e Q^^^ with fiyiG) ~ 1 such that fi agrees with fi on x G. Similarly, 
we say that the posterior distribution /i(-,y) is essentially unique if fi is essentially 
unique. 

In other words, an essentially unique solution fi may be arbitrary on the sets of 
the form T x N, where C G is a set of /xy-measure zero. In a sense, this makes 
the posterior distribution fi{-,Y{uj)) a relevant estimate of the distribution of X 
with probability 1. 

Next, we recall some results on the existence and essential uniqueness of regular 
conditional distributions in Souslin spaces. The existence of regular conditional 
distributions of X given Y has been shown in Lemma 4.2 of [116] (by using the 
definition of the Souslin space and the existence of regular conditional distributions 
on Polish spaces, leading to a universally measurable kernel fi), and in Example 
10.7.5 of [13], where also the essential uniqueness has been verified. The present 
"extension" covers //y-measurable solutions. The condensed proof is included only 
to support the last sentence, which provides some motivation for the main results 
of this work. Namely, the definition of the conditional expectation may give the 
impression that we need to specify some random variable Y among all equivalent 
random variables for determining the conditional expectation of 1[/(X) when an 
observation y — Y{ujo) G G has occurred. This is not true as a weaker description 
of Y and X suffices. 

Theorem 2.4. Let {F,F) and {G,Q) he two measurable spaces. Let X he an F- 
valued random variable and Y be a G-valued random variable on a complete prob- 
ability space (f2,I],P). If F and G are Souslin spaces equipped with their Borel 
cr-algebras, then there exists an essentially unique solution : J-" x G — > [0, 1] of the 
statistical inverse problem of estimating the distribution of the unknown X given 
the observation Y . 
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The values y) are determined by the joint distribution fJ-(x,Y) of X and Y 
for all U T and \xy - almost every y £ G. 

Proof. First we show that for each U € F there exists a solution fJ.o{U, •) : G — )- [0, 1] 
such that y i-> fio{U,y) is Borel-measurable and w i—> ij.o{U,Y{ijj)) is a conditional 
expectation of lij{X) given Y~^{G)- 

Consider the measure space (F x G,B{F x G), ^(^x,y)) and the sub-cr-algcbra 
Go = {0,-F} i8) G generated by the canonical projection P2{x,y) = y to the second 
variable. Recall, that the direct products of Souslin spaces are Souslin spaces. Due 
to the Souslin property of F x G, there exists a conditional measure : B{F x G) x 
{F X G) ^ [0, 1] such that fJ,o{U', •) is CJo-measurable for every U' £ B{F x G), the 
measure ^{-j {x,y)) is a probability distribution on B{F x G) for every {x,y) € G, 
and 

l^(x.Y){U' r\V') = / Mo(C/',(a;,y))dM(x,y)(a;,2;) 

JV 

for every U' e B{F x G) and V G Go by Corollary 10.4.6 in [13]. Let us restrict 
fJ.o{U' , {x, y)) on sets U' of the form U x G, where U E F. Since Go is trivial with 
respect to the first variable, we may denote the restriction with iiQ{U,y) where 
U £ F and y G G. Especially, y i— )■ ^o{U,y) is C7-measurable and 

P(XeC/nrey)=/i(x,y)(C/xl/)= / /io(C/,y)d/iy(y) = / tio{U.Y)dP 

Jv Jy-i(v) 

for every U & F and V £ G- Therefore, /lo : ^ x G — > [0, 1] is a solution of the 
statistical inverse problem of estimating the probabilities of the unknown X given 
the observation Y. 

A solution n is essentially unique since the Borel cr-algebra of a Souslin space 
is countably generated (see [13]). Indeed, suppose that ^ and v are two solu- 
tions in the sense of Definition 2.1. For U £ F, wc have that /i(J7, = 
E[l[/(X)|y-i(^)](a;) = v{U,Y{uj)) P-almost surely. Then /i(-,y) = i^l,y) outside 
some fiY-zeio measurable set Njj £ G^^ , since 

0^n\KU,Y)-i>{U,Y)\]^ J \fi{U,y)-iy{U,y)\dtiY{y) 

by Lemma 2.2. Every countable algebra Fq that generates the cr-algebra F is 
measure-determining, i.e. measures coinciding on Fq coincide on F (e.g. Lemma 
1.9.4 in [13]). Hence, the two solutions coincide except for y £ iiueJ^o^u- 

Finally, if fi is any solution then the values ^i{U, y) are determined by the measure 
M(Jf,y) for all £ F and /iy-almost all y S G since /i coincides with iiq on FxG (by 
essential uniqueness) and the values of /io(J7, •) arc actually versions of the Radon- 
Nikodym densities of measures fJ-(x,Y)iU, •) with respect to /ly for /iy-almost all y. 
The distribution ^y is the marginal of ^jl{x.y)- D 

We have reached the usual starting point of nonparametric Bayesian statistics. In 
a conventional Bayesian experiment, one specifies only conditional distributions of 
Y given X = x for all values x £ G - the so-called parametric family of distributions 
or sampling distributions - and the prior distribution ^x on [F, F) [45, 128], which 
together determine the joint distribution of X and Y . 

Remark 1. The choice of the sample space {G,G) of random variable Y is usually 
not trivial. One might choose as well a larger (or sometimes even a smaller) space 
than G. The solutions of the statistical inverse problem could, in principle, depend 
on the choice of the sample space {G,G) since the conditioning cr-algebra F~^(t/) 



POSTERIOR CONVERGENCE 



25 



depends on the topology of G. But since we are working with the Soushn spaces this 
is not the case. Indeed, if (Gi, Qi) and (G2, t/2) are two SousUn spaces equipped with 
their Borel cr-algebras and z : Gi i-> G2 is a continuous (or just Borel!) injection, 
then, quite remarkably, ?~^(^2) = Gi- Indeed, i~^{Q2) C Qi by the continuity 
of i. Moreover, the image of a Borel set under a Borel mapping between Souslin 
spaces is a Souslin set i.e. a Souslin space with respect to the relative topology 
(see Theorem 6.7.3 in [13]). Therefore, j(Gi), i{B) and i{Gi\B) are all Souslin 
sets in G2 for any Borel set B E Qi. By injectivity, i{Gi\B) = i{Gi)\i{B) i.e. 
the complement of the Souslin set i{B) in the subspace i{Gi) of G2 is a Souslin 
set. By Corollary 6.6.10 in [13], a Souslin set in a Hausdorff space is a Borel set if 
also its complement is a Souslin set. Therefore, i{B) is a Borel set in the relative 
topology of i(Gi). But each Borel set i{B) in i(Gi) is of the form i{B) = i{Gi)nB' , 
where B' € C/2. Therefore, B = i^^{B') for some B' G G2 which implies that Gi = 
«~^(^2)- Consequently, {iY)^^{G2) ~ Y~^{Qi) for any Gi-valued random variable 
Y. Therefore, ^i{-,Y{uj)) = /i2(-, «(F(w))) F-almost surely for any solutions fii and 
fi2 of the inverse problems of estimating the distribution of X given y : — > Gi 
and iiY), respectively. If fl2 is a Borel measurable version of //2, then jl2{U, i{y)) is 
t/i-measurable and fi2{U,y') = fi2{U,y'), except possible on some set N such that 
^i(Y){^) = which implies that. ^i2{U,i{y)) = fl2iU,i{y)) except possibly on the 
set i~^{N) which has /xy(i~^(A^)) — 0. Therefore, /i2(-,i(-)) is also a solution of 
the statistical inverse problem of estimating the distribution of X given Y. We 
arc allowed to mis-specify the Souslin sample space Gi by Borel injections without 
altering the essentially unique solution. In the general case that involves non-Souslin 
spaces, we only know that i^^{G2) C Gi, where the inclusion may be strict. As an 
example, take Gi and G2 to be the sequence space £°° where we take Gi to be the 
usual Borel a-algebra with respect to the supremun norm topology (which is not 
separable) and G2 to be the Borel cr-algebra with respect to the weak topology, and 
take i to be the identity. Then i~^{G2) 7^ Gi (sec Proposition 2.9 in [147]). 

2.2. Partial uniqueness of the solution. From practical point of view, the es- 
sential uniqueness is not enough since we are given some fixed observation j/o G G 
that might belong to the set where arbitrariness of fi still rules. Our proposal 
for removing this deficiency of the posterior distributions is to proceed as in the 
finite-dimensional case, where fi is required to depend continuously on the second 
variable i.e. the posterior distributions depend continuously on the observations. 
The following new concept turns out to be useful. 

Definition 2.5. Let (fi, I],P) be a complete probability space. Let F and G be 
two Souslin spaces equipped with their Borel ct- algebras T and G, respectively. Let 
X : — >■ F and F : fi — >■ G be measurable mappings. Let fj, he a solution of the 
statistical inverse problem of estimating the distribution of the unknown X given 
the observation Y. Let A C G and let To C We say that a solution fi is Tq- 
continuous on A if the mapping y 1— > fJ.{U, y) is continuous on A with respect to the 
relative topology for every C/ G J-q. 

Consider a set S d G that contains every point y € G whose any open neighbor- 
hood has positive ^y-measure. On Souslin spaces such a set S is known to coincide 
with the topological support of /iy, i.e. the smallest closed set S d G such that 
/iy(5') = l 
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Lemma 2.6. Let v be a Borel probability measure on a Souslin space G. The 
topological support of v exists and it consists of exactly those y ^ G whose every 
open neighborhood has positive measure. 

Proof. Sec Theorem 2.1 in [113], which generalizes to Souslin spaces, since Soushn 
spaces are hereditarily Lindelof by Lemma 6.6.4 of [13]. □ 

We obtain partial uniqueness of the posterior distributions by using the continuity 
of solutions on certain subsets of the topological support of jiy- We denote with 
A° the interior points of A. 

Theorem 2.7. Let F and G be Souslin spaces equipped with their Borel a -algebras 
T and Q respectively. Let X be an F-valued and Y be a G-valued random variable 
on a complete probability space (O, E, P). Let A € G^^' be a subset of the topological 
support S of /iy such that either A G A° or fiy {A) — 1 . Let J-q d J- be a measure- 
determining class. 

All solutions of the statistical inverse problem of estimating the probabilities of 
X given Y that are J-^- continuous on A coincide on J- x A. 

Proof. Assume that /.Ji and ^2 are two solutions that have the described properties. 
If Hi IJ,2 on T X A then there exists yo & A and Uq £ F such that ^J,i{Uo,ya) 7^ 
A«2(t^o,2/o), say ^j,i{Uo,yo) - /i2(f/o, 2/o) > £• Since /Zi(-,yo), « = 1,2, are measures, 
the set Uq can be taken to be from the measure-determining class J-q. 

The function / : A — ^ R defined as f{y) Hi{Ua,y) — H2{Uo,y) is continuous 
in the relative topology of A and positive at y^. The set /""'^((e, 00)) is therefore a 
non-empty open neighborhood of yo in the relative topology of A, and there exists 
a non-empty open set ^ C G such that V A = /~^((e, c»)). The set F n A has 
positive //y-measure. Indeed, if ^y(A) = 1, then fiyiV Ci A) = /^y(F) > by 
Lemma 2.6, since € V f^ A belongs also to the support of fiy- On the other hand, 
\f A G A° , the neighborhood Voi yo contains also points from A° . It follows that 
V n A contains a non-empty open set V D A° . By Lemma 2.6, fJ.y{V DA) > 0. 
This implies that Hi{Uo,y) — H2{UQ,y) > e on a set f~^{{e,oo)) G G'^^ of positive 
yity-measure. Therefore, it is impossible that the both mappings fii and fi2 satisfy 
the requirements of Definition 2.1. in particular the property 

/ ^^,{Uo,y)d^iyiy) ^ PiX eUoHY e f-\ie,^))), ^ = l,2, 

of conditional expectations docs not hold. Hence, the two solutions necessarily 
coincide on J" x ^. 

□ 

Remark 2. Recall, that a Borel measure is called strictly positive if it is positive 
on all non-empty open subsets. Then the topological support of the measure is the 
whole space. When fiy is strictly positive, the partial uniqueness holds on .F x G for 
the solutions that are J'o-continuous on G. If /.(y is strictly positive and the solution 
fj. is J-'o continuous on some non-empty open subset A of G, we get similarly the 
uniqueness on x ^. This situation is often encountered in finite-dimensional 
statistical inverse problems, where one usually excludes those y E G for which the 
continuous probability density function of Y vanishes. 

Remark 3. The partial uniqueness of the solution is obtained by fixing the topology 
of the space of observations. However, the topology of a Souslin space is a slightly 
ambigiuos concept in measure theoretical sense. Namely, it is well-known that 
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different topologies can generate the same Borel sets. For example, any Borel 
measurable function on a Souslin space is continuous with respect to some stronger 
topology that makes the space Souslin and generates the same Borel sets as the 
original topology (see Exercise 6.10.62 in [13] for the proof). If ^ is a J^-continuous 
on a Souslin space and fl is its Borcl-mcasurablc version that is not continuous, then 
the both are continuous with respect to some stronger topology that makes also jl 
continuous. We remark that although the essentially unique solutions are invariant 
under injective continuous mappings between Souslin spaces (see Remark 1), the 
strengthening of the topology can affect the partial uniqueness of the solution e.g. 
by diminishing the topological support. 

Due to the properties of the conditional expectation, the prior distribution nx {U) 
is the mixture / ^i(U, y)d^Y{y) of all posterior distributions so that the prior proba- 
bility of U vanishes exactly when /iy-almost all posterior probabilities of U vanish. 
When fl is regular enough, we get the following converse result, which contrasts 
nicely with the well-known representation theorem considered in the next section. 

Theorem 2.8. Let F and G be Souslin spaces equipped with their Borel a-algebras 
T and Q respectively. Let X he an F -valued and Y be a G-valued random variable on 
a complete probability space [VL, T,,P). Let A e CJ''^ be any subset of the topological 
support S of hy such that either A G A° or ^y{A) = 1. If ^ is a solution of 
the statistical inverse problem of estimating probabilities of X given Y that is J- - 
continuous on A, then the posterior distribution fJ.{-,y) at any y Cz A is absolutely 
continuous with respect to the prior distribution. 

Proof. Assume that fix{U) = for some U C J-. According to the definition of 
conditional expectation, 

(6) J pi{U, y)diiY{y) = P{x e [/) = ^lx{U), 

which now vanishes. Since the solution is non-negative, we get that fi{U,y) — 
/iy-almost surely on G. Since y i— >■ fJ-(U, y) is continuous on A, the set V = {y d A : 
fi{U,y) > 0} is a relatively open set i.e. there exist an open set F C G such that 
V n A — V. Suppose V is non-empty. Similarly as in the proof of Theorem 2.7, 
/xy(V^) = fiY{V) > when A has full measure and ij.y{V) > when A C A° . But 
this contradicts (6) because fj,x{U) = 0. Thus the set V is empty and iJ,{U,y) = 
for all y from A. □ 

When Theorem 2.8 holds for A ~ S, every Borel set B with full prior probability 
has also full posterior probability /i(i?, y) = 1 for all y G S*. Our posterior perception 
of the unknown appears to be inline with our prior insight in this aspect. 

Remark 4. According to a result of Macci [103], the absolute continuity of the 
posterior distributions n{-,y) with respect to the prior distribution for /iy-almost 
every y implies that the conditional distribution of Y given X = x is absolutely 
continuous with respect to (=dominated by) fj.Y for /i^-a.e. x € F. By Theorem 2.8, 
the posterior probabilities of Borel sets may depend continuously on the observations 
y Cz G only in the dominated cases i.e the conditional distribution of Y given 
X = X has to be absolutely continuous with respect to some cr-finite measure for 
/ix-a-G. X F. The same conclusion holds even if the space G is replaced with 
some subset A G Q^^^ having full /iy-measure. In the undominated cases, the 
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posterior distribution has necessarily a large amount of discontinuities - the set of 
all discontinuity points must have positive /iy-measure. 

A partial converse to Theorem 2.8 holds in complete separable metric spaces. 

Theorem 2.9. Let F and G be complete separable metric spaces equipped with 
their Borel a-algebras T and Q , respectively. Let X be an F-valued and Y be a G 
random variable on a complete probability space (£7, S, P). Let ji be a solution of the 
statistical inverse problem of estimating the distribution of X given Y . If the family 
of the posterior distributions {^(-,2/) : y € G} is dominated by a Borel measure 
V on T, then for every e > there exists a compact set K = Kie^ji) € Q such 
that ^y{G\K) < e and fi is J- q- continuous on K for some family J-q of measure- 
determining sets. 

Proof. Equip the space M of all probability measures on (i^, J-) with the topology 
of the weak convergence (i.e. convergence of integrals of all bounded continuous 
functions). It is well-known that this space is a complete separable metric space 
whenever F is a complete separable metric space (see Theorem 8.9.5 in [13]). Equip 
M with the Borel tr-algebra M. with respect to the weak topology i.e. the cylinder 
set cr-algebra of the sets of the type 

{veM:{v{f,),v{f2),...)eB}, 

where fi,i E N arc continuous bounded functions on F and B E S(R°°) (with 
respect to the coordinate- wise convergence). By Condition 2 of Definition 2.1, 
the solutions y i— >■ fjL{-,y) are /iy-measurable mappings from G to M since y i— >■ 
fJ-ifjy) is //y-measurable as a pointwise limit of integrals of simple functions. By 
the Lusin theorem (see Theorem 7.1.13 in [12]), there exists a family of compact 
sets K, C G such that given any e > 0, the probability ^y{K^) < e for some 
K E JC and the measure- valued random variable y i— ^Ji{-,y) is continuous on K in 
the weak topology, of measures implying that lim,;_j.oo l^{f,yi) — l^ifiV) whenever 
limi_>.oo J/i = y m. K. Especially, the mappings y i— ^ fJ-i'iy) arc J-o-continuous on 
K, where J-q consists of all Borel sets U whose boundary satisfies iJ.{dU, y) = 
for all y £ K. This follows from the fact that limi_i.oo m(C^, Ui) = f-{U, y) whenever 
lim,;_>.oo yi = y iiy the weak convergence (see Corollary 8.2.10 in [13]). If the family 
of posterior distributions {//(•, y) : y € K} is dominated by some Borel measure, 
then J-'o is a measure-determining set (see Lemma 1.9.4 and Proposition 8.2.8 in 
[13]). □ 

3. The representation of posterior distributions 

In this section, we consider a known representation formula (see Section 1.2.2 in 
[45], Theorem 1.31 in [128], or pp. 231-232 in [132]) for solution of the statistical 
inverse problem of estimating the distribution of X given the observations Y that 
generalizes the finite-dimensional formula 

D{x\y) = CDiy\x)Dpr{x). 

For readers convenience, the proofs of Lemma 3.1, Lemma 3.2 and Theorem 3.3 are 
included, although they are special cases of more general known results. 

Throughout the section, we assume that F and G are locally convex Souslin 
topological vector spaces equipped with their Borel cr- algebras J-' and G, respectively, 
and X is taken to be an i^- valued random variable and e is taken to be a G-valued 
random variable statistically independent from X. All the random variables are 



POSTERIOR CONVERGENCE 



29 



defined on the same complete probability space ($7, S, P). The mapping L : F ^ G 
is assumed to be continuous. We denote Y = L{X) + e. 

First, we check that Y is indeed a random variable as a combination of Borel 
measurable mappings. The product space F x G is equipped with the usual product 
cr-algcbra J- (E) G generated by rectangles U x V, where U € J- and V € Q. 

Lemma 3.1. The mapping T : (x, z) H- L(x) + z is Borel measurable from F x G 
to G. 

Proof. As the addition is just B{G x G)-measurable by continuity, there is the 
question whether the Borel a-algebra B{G x G) of the topological product space 
coincides with the product cr-algebra Q ® Q generated by the rectangles V x W 
where V,W €g. 

Certainly, Q ®Q G B{G x G). since the products of open sets V,W C G form a 
basis of topology for G x G. 

Due to the Souslin property, the space Gx G is hereditarily Lindelof ([13], Lemma 
6.6.4 and Lemma 6.6.5). Any open set in G x G can therefore be expressed as a 
countable union of sets of the form V x W, where V,W G G are open. Hence 
B{GxG) cG^g. □ 

We verify now that for any /iy-integrable / : G — > R, the conditional expectation 
of f{Y) given X is the random variable 

E[/(y)|x](c.) = / f{z)d^l,+Lixi.)){z)■ 

Jg 

Here the measure fJ-e+L{x{LL;)) is the image measure of the random variable w' i— 
s{uj') + L{X{ll!)), where X{uj) is treated as a constant. We apply the following more 
general claim, for which we failed to find a reference. 

Lemma 3.2. Let Z\ he an F -valued and Zi he a G-valued random variable that are 
statistically independent. Denote Z3 = T{Zi, Zi), where T : F x G ^ G is a Borel 
measurable mapping. For any fiZs -integrahle function / : G — > R, it holds that 

E[/(Z3)|Zi](c.)= / /(z)dMT(z,(.),z.)(^) 
Jg 

P-almost surely, and Jq f{z)dfiT(Zi(Lo),Zo)iz) version of the conditional expec- 
tation of f{Z^) given a{Zi). 

Proof. We show that the claim holds for a Borel measurable version of /, which 
exists by Proposition 2.1.11 in [13]. The generalization for -measurable functions 
follows then from Lemma 2.2. 

Remark that foT : Fx G — > R is then a Borel measurable function. We will show 
that Fi[g{Zi, Z2)\Z2]{uj) — J^g{Zi{uj),Z2)dfiz2{z2) holds for all Borel measurable 
simple functions g on Fx G. The usual approximation of Borel measurable functions 
with simple functions implies then for g = f oT that 

E[/(r(Zi,Z2))|Zi]H = / f{T{Z^{Lo),Z2))d^lzAz2)^nf{T{Zl{uJ),Z2))] 

JG 

f{z)dfiTiz,{Lu),Z2}{z)- 

Take now g ~ Iq, where C G B{F x G). We need to determine the conditional ex- 
pectation E[l(7(Zi, ^2)1^2] i.e. the conditional distribution of {Zi, Z2) given cr(Z2). 
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Since F and G are Souslin spaces, a regular conditional measure exists (by Corol- 
lary 10.4.6 in [13]) and is determined by values on any measure-determining sets. 
In Souslin spaces, the rectangular sets C = Bi x B2, where Bi G and B2 £ G 
are measure-determining sets, since B{F xG) = T®Q (see the proof of Lemma 3.1 
and Lemma 1.9.4 in [13])). By the properties of the conditional expectation, 

E[1bixB2(-Z'i, ^2)|-Z'l](t^) = lBi(^l('^))y" 152(^2)^^22(^2) 

= j lBtxB2{Zi{uj),Z2)dpz2{z2)- 

□ 



Here is the description of the solutions /i([/, z) modulo ^y-zero measurable sets. 
The result is a special case of Kallianpur-Striebel formula [79]. 

Theorem 3.3. Let P^+l(x) absolutely continuous with respect to a a-finite mea- 
sure V for p,x-a.e. x Cz F. Set 

' 1 otherwise. 

// p(a;, z) is a non-negative fix x v -measurable function on F x G, then there is 
an essentially unique solution p of the statistical inverse problem of estimating the 
distribution of X given Y = L{X) -\- s such that 

/-^ f.r ^ J luix)pix,z)dpxix) 

(7) KU, z) = ^ — — — 

J p{x,z)dpx{x) 

for all z G G\Nq, where the set 

No = {z £ G : J p{x, z)dpix{x) = or 00} 

has pY-Tneasure zero. 

If py{N) = then N is also Pe+L(xo)~z^™ measurable for px-oi'most every 
Xq € F. If additionally p^-\-L(x) ^ for all x G F and p is positive px x v-almost 
everywhere, then p^^i^(^xa)i-^) — for all xq € F. 

Proof. Let p{U,z) be defined by (7). If z e A^O; we set p(U, z) — 1u{xq) for some 
fixed Xq e F. We prove that p is a solution. 

Let U € T and V G G- By Theorem 2.4 there exists an essentially unique solution, 
which we denote here with p. We write two expressions for P{X G U D Y G V) 
using Lemma 3.2. The first is 

n^u{X)E[lv{Y)\X]]= I lu{x) (^j lv{z)dp,+L(x){z)^ dpx{x) 

(8) = / lu{x) { [ lviz)p{x,z)diy{z)] dpx{x) 



= / 1^(2) / lu{x)p{x,z)dpx{x) dv{z) 
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and the second expression is 

E[ly(y)E[ly(X)|F]] -E[ly(y)Ai(C/, Y)] 

=E[E[iy(y)A(c/,r)|x]] 

(9) = / / lv{z)p{x,z)fl{U,z)dv{z)d^xix 



= / ^v{z) / p{x,z)dpx{x) ] p{U,z)diy{z) 



The measurabihty of p is used in changing the order of integrations by the Fubini 
theorem. The integrability of p foUows automaticahy from the finiteness of the 
left-hand side of (9) for U ^ F and V = G. 

Since the equivalence of (8) and (9) holds for all V £ G, obtain 

p{U,z) J p{x,z)dpxix) = J lu{x)p{x,z)dpx{x) 

for {/-almost every z. Hence, p.{U, z) ~ fJ-{U, z) for {/-almost every y such that 
< / p{x, z)dpx{x) < oo. 

The denominator in (7) may vanish only on a set A of /iy-measure zero since 
the choice U = F, V = A gives py{A) = in (8). The same consideration implies 
that also the measure py is absolutely continuous with respect to ly. Similarly, the 
denominator is finite i^-almost surely, which implies ^y-almost surely. We conclude 
that A^o has yUy-measure zero and p.{U, z) = p{U, z) /iy-almost surely. Then p{U^ y) 
satisfies Condition 2 of Definition 2.1. By Lemma 2.2, p satisfies Condition 1. By 
the integrability of p, p satisfies Condition 3 of Definition 2.1. 

We proceed to the last claim. Taking V = N and U & F in (8) implies that 
tJ'e+L{x)iN) = J 1n{z)p{x, z)di'{z) vanishes for px-slmost all x £ F . When p is a.e. 
positive, also v{N) has to vanish. We obtain PL[xo)+e{^) = for all xq e F by 
using the absolute continuity. 

□ 

The last statement of the above theorem is added to show how small the zero 
measurable set for a given unknown is. The representation formula does not improve 
the essential uniqueness of solutions, because the Radon-Nikodym density z 
dpL(x)+s/d-viz) is only determined up to i/-equi valence. It should be noted that 
under the domination assumptions on Pe+L(x) in Theorem 3.3, there always exists 
versions of the Radon-Nikodym densities that are jointly measurable. In [79], this 
claim is proved assuming that Y~^(Q) is countably generated. In Souslin spaces, 
the Borel cr-algebras are countably generated by Corollary 6.7.5 in [13]. 

It is easy to see, that the prior distribution px and the posterior distribution 
p{-, z) are equivalent if (7) holds and z) > /ijf -almost everywhere. 

Remark 5. The existence of i/ is a delicate matter. For example, the measure 
Me+L(x) may not be almost surely absolutely continuous with respect to /iy, al- 
though 

Py{U) = E[l^(y)] = E[E[lc/(y)|X]] = j Pe+Lix)iU)dpx{x) 

by Lemma 3.2. We can only conclude that Pe+L(x){U) vanishes px-'A.s. whenever 
Py{U) vanishes and the px-zcvo measurable set may depend on U. A Gaussian 
example in Remark 10 of Section 5 shows that this is indeed the case. In general, 
the Halmos-Savage theorem (Lemma 7 in [59]), states that from a dominated family 
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of finite measures, which in our case is ■ x G M} where ^x{M) = 1, one 

can pick out countably many measures ^^^^(^x-^ in such a way that the measure 
ly := J^i'^it^e+Lixi), where J^i'^i — 1 ^^'^ '^i > 0; only a dominating 

measure but also equivalent to the family {^ie+L(x) ■ x G M} (i.e. the measures 
in the family vanish on the same subsets as v). Especially, this gives a necessary 
and sufRcient condition for the domination of the probability measures fj.^^i^(^xy In 
Section 5 we concentrate on special cases where /^e can be taken as i^. In these 
examples, we require that = whenever /^e(C/) = i.e. fie is quasi- 

invariant with respect to translations with L{x), where x F. This allows the use 
of any prior distribution on F. However, Remark 7 in Section 5 demonstrates that 
in dominated cases it is not always possible to choose fx^ as v. 

We return to the question of partial uniqueness (Theorem 2.7). The conditions 
in the next theorems allow easier validation of the measurability and guarantee 
some continuity for the solutions. However, under the stronger assumption that 
the function {x,z) i— >■ p{x,z) is jointly continuous and bounded, the solution ^ is 
always J^-continuous on G (see Theorem 7.14.8 in [13]). Recall, that the class of all 
Souslin sets is quite large since all Borel subsets of a Souslin space are Souslin sets 
by Corollary 6.6.7 in [13]. 

Theorem 3.4. Let fJ-^+L{x) be absolutely continuous with respect to a a-finite mea- 
sure V for nx-almost every x £ F. If 



is a separately continuous function on some Fq x A, where Fq is a Souslin subset of 
F with full fix-measure and A is a Souslin subset of G such that vlA^) = 0, then 
p is fix X -measurable. 

If additionally sup^g^^ p{x, z) G L^{fix ) for all compact sets K CZ G then 



is J- -continuous on K (1 {z £ A : < J p{x, z)dfix{x) < oo} for every compact set 



Proof. Assume that p is separately continuous. Since Fq is a Souslin space, there 
exists a continuous surjection R from some complete separable metric space M onto 
Fq. We consider first the function (m, z) — > p{R{m),z) on M x A. This function 
is a pointwise limit of continuous functions due to a theorem of W. Rudin [124]. 
Hence the function (m, z) H> p{R{m), z) is B{M x A) = B{M) ® B{A) -measurable. 
We compose it with a fix x z^-measurable mapping (i?^^, /) where the inverse comes 
from the measurable choice theorem (see Theorem 6.9.1 in [13], note that Souslin 
sets are universally measurable i.e. measurable with respect to any finite Radon 
measure by Theorem 7.4.1 in [13]). Then we sec that 1a{z)1fo{x)p{x, z), together 
with its equivalent mapping p{x, z), is px x i^-measurable. 

By the Lebcsgue dominated convergence theorem, we obtain the sequential con- 
tinuity of the marginals. On Souslin spaces, the compact sets are metrizable (see 
Corollary 6.7.8 in [13]). In metrizable spaces sequential continuity coincides with 
continuity. □ 

The above Theorem 3.4 shows J^-continuity of the solution p on {z d A : < 
J p{x, z)dpx{x) < oo} when G is e.g. a fc-space i.e. a subset C of G is closed if 




(10) 



z i-> p{U, z) 



J lu{x)p{x,z)dpx{x) 
J p{x,z)dpx{x) 



K CG. 
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and only if C n is closed for every compact K C G (see Definition 43.8 in [157]). 
Indeed, it is well-known that a function / is continuous on a fc-space if and only 
if it is continuous on every compact subset. In particularly, this holds for all first- 
countable spaces, like metric spaces. Note, that the space of tempered distributions 
iS'(R") is a /c-space when equipped with its strong topology but not with its weak 
topology, while the distribution space ViU) is not a fc-space with respect to either 
topology [61]. But V'iU) is a Lusin space [129] - i.e. a Hausdorff space that is 
a continuous injective image of a complete metric space - and can be equipped 
with a stronger metrizable topology inherited from the metric space. However, this 
topology depends on the chosen metric space and has all the drawbacks indicated 
in Remark 3. 

We combine Theorem 2.7 and Theorem 3.4 in a simple case. 

Corollary 1. Let G be a k-space. Let li^+Lix) be equivalent with a probability 
measure v for every x G F. Denote Si, the topological support of v. If 

P{x,z) = -j [z) 

av 

is a separately continuous function on F x Si, and if sup. p{x,z) G L^[pix) for 
all compact subsets K <Z G then all solutions of the statistical inverse problem of 
estimating the distribution of X given Y that are J- - continuous on {z €^ Si, : < 
J p{x, z)dfixix) < oo} coincide with 

/-iix . nr \ J '^u{x)pix,z)diJ,x{x) 

(11) z fi{U, z) ^ ^ — — — 

J p{x,z)dfix{x) 

on T X {z G S,^ : < J p{x, z)dp,x(x) < oo}. 

The proof is an immediate consequence of the following lemma, where we char- 
acterize the topological support of fiy in more convenient terms. 

Lemma 3.5. Let Y = L{X) +e, where X and e are statistically independent. The 
topological support of pLy is the smallest closed set S <Z G such that He+L{x)iS) = 1 
for px-o-li^ost every x G F. Moreover, if p.e+L{x) is equivalent with a probability 
measure v for every x G F, then the topological supports of fiyCind v coincide. 

Proof. The first claim follows from the convolution fJ,y{S) = p,^+Lt^x){S)dp,x{x). 
For the second claim, we note that p,e+L(x){S) — v{S) for every closed set S C G 
with full I'-measurc (or full /^^^^(2:)"Eueasure). □ 

4. Converging approximations 

Throughout this section, we use the following assumptions 

Definition 4.1. We say that Assumption A holds, if the following four conditions 
are satisfied. 

(1) Topological spaces F and G are locally convex Souslin topological vector 
spaces equipped with their Borel a-algebras T and Q, respectively. 

(2) The triple (fi, E, P) is a complete probability space, X and X„ are F-valued 
random variables on f2 and e is G- valued random variable on fl. The random 
variables X and e are independent. The random variables Xn and s are 
independent. 

(3) The mapping L : F — )- G is continuous, and we denote Y = L{X) e and 
y„ = L(A„)+e. 
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(4) The measure ^£+^(3;) is absolutely continuous with respect to some tr-finite 
measure v on (G, f/) for any x € F, and its density 

pi^^y) ■= — ^ — [y] 

is a /iz X :^-measurable function on F x G for random variables Z = X and 
Z = Xn, n e N. 

When Assumptions A holds, we can use Theorem 3.3 to represent the approxi- 
mated posterior distribution of Xn given y = y„(a;o) as 

.-,o^ nr \ I ^uix)pix,y)dfix^{x) 

(12) ^in{U,y):^^ — 

J P{x,y)d^ix„(x) 

and the posterior distribution of X given y = l^(wo) a-s 

.-,o^ nr \ I'^u{x)pix,y)dnxix) 

(13) fi{U,y):= — 

J p{x,yjdp,x[x) 

for all U E T and y G Mq, where 

(14) Wo = {?/ e G : < y" p{x, y)dpz{x) < 00, for Z = X, X„ where n S N} 

has full /iy-measure. 

We recall some definitions on the convergence of measures. 

Definition 4.2. Let m and ?7i„, where n g N, be cr-finite measures on a topological 
space F equipped with the Borcl (T-algcbra J-. 
(i) The measures m„ converge weakly to m if 



lim / f{x)dmn{x) = / f{x)dm{x) 

n-^ca J J 

for all bounded continuous functions / on F . 
(a) The measures to„ converge setwise to m if 

lim TO„(C/) = m{U) 

for every U E J-. 

[in) The measures m„ converge in variation to m if 

lim sup |m,i(?7) — m(C/)| 0. 

It is well-known that the weak convergence of the probability measures implies 
the convergence of certain expectations on regular enough spaces. The following 
theorem generalizes slightly Lemma 8.4.3 in [13] by requiring that the discontinuities 
of / belong to some m-zero measurable set. The proof for the present case seems 
no to be readily available in the literature. 

Lemma 4.3. Let F be a locally convex Souslin topological vector space and m,m„, 
where n e N, be finite measures on {F,J-). Let f be an m-integrable Borel function 
on F whose discontinuities are contained in an m-zero measurable set. If 



lim sup / |/|(a:;)fim„(a;) = 0, 

C~KX> n J\f\>C 

then m„(/) converge to m{f) whenever nin converge weakly to m 
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Proof. Consider first a bounded Borel measurable function g, say \g\ < c, whose 
points of discontinuity belong to an m-zero measurable set Ng. The integral of g 
can be written as 

gdnin = / tdijiin o g^^)(t) 



where the integrand is bounded and continuous on [— c, c]. We show that the mea- 
sures rUn o g~^ on [— c, c] converge weakly to m o g~^ , which immediately implies 
the convergence of mn{g) to m{g) as n grows. We apply a well-known property of 
completely regular spaces (see Corollary 8.2.4 in [13]), according to which the weak 
convergence of m„ o g~^ to the Radon measure m o g~^ is equivalent to 

limsupm„ o g~^[A) < mo g~^[A) 

n—^oo 

for all closed sets A. Note that all locally convex spaces are completely regular. If 
A C [— c, c] is closed, the closure g~^{A) C g~^[A) U Ng because g is continuous in 
the relative topology of G\Ng. Since Ng has zero m-measure. 



m{g ^{Aj) — m{g ^{A)) > limsupTO„((7 ^{A)) > limsupTO„((7 ^{A)) 

n— ^oo n— ^oo 

by the weak convergence of measures m„. 

Let A denote the binary operation of taking the minimum of two real numbers. 
For the general case, we approximate / with bounded functions sgn(/)(|/| A C) in 
the difference 

(|fe)„-m)(/)| = |(m„-m)(/-sgn(/)(|/|AC) + sgn(/)(|/|AC))| 

sgn(/)(|/|AC)d(m„-TO) 



< sup/ \f\{x)d{mn+m) + 

n J\f\>C 

By the assumption, the first term in the sum (15) gets arbitrarily small when C is 
chosen large enough. Since sgn(/)(|/| AC) =: g is bounded and m-a.e. continuous, 
the second term in the sum (15) converge to zero for fixed C when n grows by the 
weak convergence of the measures m„. □ 

Lemma 4.3 can be applied for / = gp{-,y), m„ = fix^ and m = p,x, where g is 
any continuous bounded function on F. 

Theorem 4.4. Let Assumption A hold and let fin, /i and Mq be defined by equations 
(12), (13) and (14), respectively. Let y e Mq and let the discontinuities of x t-^ 
p{x, y) belong to a px-zero measurable set. If the functions x i— > p{x, y) satisfy 

lim sup / \p{x,y)\dfix„{x) =^0. 

then the approximated posterior distributions pn{-,y) converge weakly to the poste- 
rior distribution /u(-,y) whenever the approximated prior distributions px^ converge 
weakly to the prior distribution px. 

The conditional mean is a common estimate for the unknown. The convergence 
of conditional mean estimates in the weak topology of F is considered next. 

Theorem 4.5. Let Assumption A hold, let y e Mq and let the discontinuities of 
X I— > p{x,y) belong to a px-zero measurable set. If functions x — > {x,a)*' p{x,y) 
belong to L^{px) and satisfy 

lim sup / \{x,a)\'^p{x,y)dpx„{x) = 0, 
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for fc = 0, 1 and a € F' , then the approximated weak conditional mean estimates 
Pn{{' , ot) , y) converge to the weak conditional mean estimate p{{-,a),y) whenever 
the approximated prior distributions p,x„ converge weakly to the prior distribution 
px- 

Proof. The nominators and the denominators of 

./ \ X J{x,a)p{x,y)dpx„{x) 
l^n{{-,a),y) = p- — — 

converge as n grows by Lemma 4.3, and the hmit of their quotients is p{{- , a) , y) . □ 

When F is a separable Banach space, we can state conditions for the norm 
convergence of the conditional mean estimates that are defined as Bochner integrals, 
that is, / xdm{x) is taken to be the limit in F of integrals of simple functions of 
the form x ^ X^iLi ^kAuk- (^)' where x^. £ F and Uk^ € F. 

Theorem 4.6. Let Assumption A hold and let pn, P o,nd Mq be defined by equations 
(12), (13) and (14), respectively. Let y € Mq and let the discontinuities of x i— > 
p{x, y) belong to a px-zero measurable set. Additionally, let F be a separable Banach 
space with norm \\ ■ \\. 

If II • \\p{-,y) e L^{px) satisfies 

(16) lim sup / \\x\\''p{x,y)dpxn{x) = 



{a;:||2:||'=/[)(2:,i;)>C} 



for fc = 0, 1 then the conditional mean estimates J x pn(dx, y) converge in the norm 
of F to the conditional mean estimate J x ^{dx,y) whenever the approximated prior 
distributions px„ converge weakly to the prior distribution px ■ 

Proof. The assumptions for fc = guarantee that the denominators J p{x, y)dp,Xn (^) 
of the posterior distributions converge to / p{x, y)dpx{x) as n — >■ oo by Lemma 4.3. 

The function || • |j/o(-, y) has a finite expectation with respect to all measures /ix„ , 
n € N and px- Therefore, the mapping x i—)- xp{x,y) is Bochner integrable with 
respect to all px„ and px, and its discontinuities belong to a //x-zero measurable 
set. 

For the moment, let us choose random variables X„ and X on another prob- 
ability space (f2, E, P) in such a way that their image measures are px„ and 
px, respectively, and the random variables X„ converge almost surely to X as 
71 — > oo. Such a choice is possible by the Skorokhod representation theorem (see 
Theorem 8.5.4 in [13]). Especially, Xnp{Xn,y) — Xp{X,y) is Bochner integrable 
with respect to the probability measure P by the triangle inequality. Denote with 
Ac = {\\XnpiXn,y) - Xp{X,y)\\ > C} for C > 0. Then the nominators of the 
posterior distributions satisfy 

xp{x,y)dpx„{x) ~ I xp{x,y)dpx{x) 



X„piX„,y)~Xp{X, y)dP 
< I \\X„p{X„,y)-Xp{X,y)\\dP + 



Ac 



C A\\X„piX„,y) ~ Xp{X,y)\\dP 
Ii{n-C)+l2{n-C). 



POSTERIOR CONVERGENCE 



37 



The integrals Ii(n;C) vanish when C — > cxj since their arguments are uniformly 
integrable. Indeed, both ||X„p(X„, y)|| and ||Xyo(X, y)|| are uniformly integrable, 
as is also their sum. Any sequence of non-negative functions that has uniformly 
integrable upper bound with respect to a finite measure is again uniformly integrable 
with respect to the finite measure. These facts are direct consequences from the 
characterization of the uniformly integrable function through uniformly absolutely 
continuous integrals (see Proposition 4.5.3 in [13]). 

The integrals l2{'n; C) for a fixed C converge to zero as n ^ oo by the Lcbesguc 
dominated convergence theorem and continuity properties of p. □ 

Remark 6. Theorem 4.6 generalizes the similar convergence results of [63, 96], 
in which the spaces F and G are separable Banach spaces, e is Gaussian for non- 
Gaussian £ on more general spaces. In [63, 96], it is assumed that 

supE[exp(a||X„||)] < oo 

n 

for all a > 0. This attractive condition is stronger than (16) for the given p, which 
has the form p{x,y) = exp{{y,Lx)G — '^\\Lx\\'jj), where H is a certain Hilbert 
space. Indeed, by the de la Vallee Poussin theorem (e.g. Theorem 4.5.9 in [13]) the 
condition (16) is equivalent to the existence of a nonnegative increasing functions 
gk on R such that limt_j.oo t~^gk{t) = +oo and sup„ / gk{\x\'' p{x, y))dij.x„ (x) < oo. 
Moreover, 

\\xfpix,y) = 11x11'= cxp((y,ix)G - l\\Lx\\j,) < cxp(a|lx|l), 

where a = 1 + ||yl|G||i||F^G- The choice g{t) = guarantees that the condition 
(16) holds when sup„ E[exp(a||X„|l)] < oo. 

Next, we pursue after a stronger convergence of the posteriors. 

Theorem 4.7. Let Assumption A hold and let p. and Mq be defined by equations 
(12), (13) and (14), respectively. Let y G Mq. 

If the measures U i— >■ jjj p{x,y)dpn{x) on (F,F) are uniformly bounded, and 
equicontinuous at zero in the sense that for every decreasing sequence {Ui} C J- 
with empty intersection, 

lim sup / p{x,y)dpn{x) = 0, 

then the approximated posterior distributions ^„ (•,?/) converge setvice (or in varia- 
tion) to the posterior distribution p{-,y) whenever the approximated prior distribu- 
tions converge setvice (or in variation) to the prior distribution px- 

Proof. Assume first, that the approximated prior distributions converge setwise. 
Define a finite measure v := px + 2~'Vx„ on {F, F). Each px„ is absolutely 

continuous with respect to v and has Radon-Nikodym density /„ := '^'^^" . 

The measurable function x ^-^ p(x , y) is an increasing limit of some simple func- 
tions (j)y'^ (x) and, by Egorov's theorem, the convergence is almost uniform with 
respect to the measure v. That is, for every e > 0, there exist a set A^ such that 
0y converge uniformly to p{-, y) on and v{A'^) < e. One may choose a sequence 
Ej — > and get increasing sets A^. such that v{f^jA'^) = and the simple functions 
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i>y'^ converge uniformly on each A^. . But then 



p{x,y)diJ.x{x) - / p{x,y)dfix„{x) 
u Ju 



< 



4>'^^\x)dpx{x) - / 4>'^\x)dpx„ 
u Ju 



p{x,y) - (l3^\x)dpx{x) - / p{x,y) ~ (t)y{x)'-''^dfix„ 
u Ju 

where the last term is bounded by 

\p{x,y) - 0«(a:)|(/„ + f){x)d,^{x)+ [ \p{x,y) - </)«(a;)|(/„ + f){x)d,.{x) 

UnA,. J AC, 



--■■h + 1'. 



2 

In the integral /2, the estimate y) — (t>y ''{x)\ < p{x, y) gives 
(17) /2 < sup / p{x,y)d{^x„ + fJ'x){x). 

n J AC 

If the intersection of the sets A^^ is not empty, we subtract the i^-zero measurable 
intersection from each Af, . Then the equicontinuity at zero of measures U i— >■ 
Jjj p{x, y)dp.x„ (x) implies that the integrals (17) are bounded by any given positive 

(i) 

number when j is large enough. The final thing is to choose the simple function 0y 
so that \p{-,y) ~ '\>y' \ is small enough on chosen A^. and then choose large enough 

n so that \\ixi)^u'i>y^) — Mx„(l(7'/'y ')| gets small enough. This is possible since the 
integrand is a bounded simple function and /j,x„ converge setwise to p,x- 

In order to prove convergence in variation, just add sup^^jr in front of the above 
estimates. □ 

Equivalent conditions for the equicontinuity at zero of a bounded family of mea- 
sures m„ on [F^T) are presented in Lemma 4.6.5 in [13]. The setwise convergence 
of measures px^ actually implies that they are cquicontinuous at zero by Theorem 
4.6.3 in [13]. 

5. Examples of noise 
Below, some cases are presented, where the Radon-Nikodym derivatives 

dv 

exist with respect to some cr-finite measure v. Two first cases, where the noise term 
is finite-dimensional or Gaussian, are well-known. For these cases, we apply the 
results of previous sections. The next four cases demonstrate that the approach 
taken in this paper applies also for more general noise models. 

5.1. Finite-dimensional noise with a probability density. This example ex- 
tends the convergence results in [44] to locally convex Souslin space-valued un- 
knowns. Let G be the Euclidian space R*"', let be a locally convex Souslin space, 
and let i : i<" — > G be a continuous mapping. Consider the statistical inverse 
problem of estimating the distribution of an _F-valucd random variable X given a 
sample y of a G- valued random variable Y = L(X) +e, where the G- valued random 
variable e is statistically independent from X. In order to use the representation 
formula of Theorem 3.3 for the essentially unique posterior distribution of X given 
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a sample yo of Y, we need the required a-finite measure v. A natural choice is to 
take the Lebesgue measure as v, when possible. 

Assume that the noise e is R*^ -valued random vector whose image measure fi^ is 
absolutely continues with respect to the Lebesgue measure , say jj,e{dx) = De{x)dx, 
with the property that > almost everywhere. Especially, /x^ is then equivalent 
to the Lebesgue measure. 

In Theorem 3.3, the Radon-Nikodym derivative of ^g+L(x) with respect to the 
Lebesgue measure , i.e. {x,y) i— D^{y — L{x)), is required to be jointly measurable. 
Since is measurable, and the addition is measurable, the continuity of L suffices 
here. We obtain an essentially unique solution /i of the statistical inverse problem 
of estimating the distribution of X given a sample j/o of Y that satisfies 



IJ-iU.ya) = 



_ luDeiyo - L{x))dfix{x) 



J D^iyo - L{x))d^ix{x) 

for all [/ e and all j/o such that < J D^{yQ — L{x))dfix{x) < oo. Here D^{yQ — 
L{x)) is often called the likelihood function. If is continuous and bounded, 
we may drop out the word " essentially" , as the solution is the unique continuous 
solution by Corollary 1 (the topological support of jiy is the whole space by Lemma 
3.5 since y^e+L(x) is equivalent with the Lebesgue measure). 

When X is an R™-valued random variable with a density Dpr{x) with respect 
to the Lebesgue measure, we get the familiar expression 



De{y - L{x))Dpr{x)dx 



J D^[y ~ L{x))Dpr{x)dx 
for all y such that < J D^{y — L{x))Dpr{x)dx < oo. 

Remark 7. When > almost everywhere, need not be equivalent to the 
Lebesgue measure. Moreover, the translated measure fJ-e+Lix) need not be absolutely 
continuous with respect to /i^. 

We consider next the convergence of posterior distributions. Let fix„ be the 
finite-dimensional distributions that approximate nx and denote with X„ the cor- 
responding i^- valued random variables that are statistically independent from e. 
Denote 



lJ-n{U,y) 



Ju^eiy - L{x))dnx„{x) 



J D,{y - L{x))dnxAx) 
the corresponding solutions of estimating the probabilities of X yi given Yrfi — Li^X^i)-]- 
e. When is continuous and bounded, the probabilities fJ-ni'jy) converge weakly 
to 

_ lD,{y~L{x))dfix{x) 
^^'^y' jD,{y-Lix))dfixix) 

for all y such that 

inf / D^{y — L{x))d^x (a;)>Oand sup / D^{y ~ L{x))d^x {x) < oo 
"J " n J 

whenever /^x„ converge weakly to ^x by Theorem 4.4. Also Theorem 4.6 and 

Theorem 4.7 are available, provided the assumptions hold. 

In practical applications one often takes such approximations of ^x that can be 

identified with a probability distribution dx on R" by some linear isomorphism 

X„ defined on a subspace of full measure i.e. °'^n^ = o'^dx. 
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5.2. Infinite-dimensional Gaussian noise. The finite-dimensional Gaussian noise 
model is often chosen because of its relatively straightforward justification - if the 
total noise is produced by many identical independent noise sources, the sum is 
nearly Gaussian by the central limit theorem. For instance, this applies to the ori- 
gin of thermal noise in electrical circuits, where heat motion of the charge carriers 
disturbs the analog signal. The usual model of thermal noise is white Gaussian 
noise, which is an acceptable approximation on usual frequencies. 

We first recall a method for constructing infinite-dimensional Gaussian random 
vectors by a procedure linked to abstract Wiener spaces [12]. 

5.2.1. Basics of Hilbert space-valued Gaussian random variables. Let H he a sepa- 
rable Hilbcrt space. Wc define Z as a random sum 



where Zi are independent standard normal random variables on (fi, E, P) and {e^} 
is an orthonormal basis of H. Clearly, the sum does not converge a.s. in H. Instead, 
we take a larger Hilbcrt space G into which H can be imbedded with an injective 
Hilbcrt-Schmidt operator j. When the range of the imbedding is dense, the triple 
{j,H,G) is a special case of an abstract Wiener space [12]. However, we do not 
require the range to be dense. Let G' denote the dual space of G and (•,•) the 
duality between G and G' . 

A sufficient condition for the a.s. convergence of the random sums X^ILi ^i^i 
G is that the series 



is convergent [73] . But this follows from the Hilbcrt-Schmidt property of the inclu- 
sion map j. 

Since G is a separable Frechet space (more generally, a locally convex Souslin 
space [129]), its Borel ct- algebras with respect to the weak and the original topology 
coincide. The benefit of the weak topology is that the measurability of the limit Z — 
lim„_i.oo J27=i ^i^i '^^^ ^® checked similarly as in the case of real-valued functions 
with sets of the type njLj{[ (Z, — ai\ < a}. We conclude that the a.s. limit 
Z of the random sums defines a measurable mapping from (il, S,P) to {G,Q). Its 
image measure i^iz = P o Z"^ can be viewed also as a countably additive cylinder 
set measure. 

In general, the mean of a random variable Z is the vector m e G" = G such that 
E[(Z, 0)] ~ (to, (f)) for all 4> E G' and the covariance operator of Z is the mapping 



C :G' ^G such that 

(C0,7^) = n{{Z,4>) - (m,0))((Z,V^) - (to,V))] 

for all (t),i:€G' [12]. 

Since limits of Gaussian random variables are Gaussian, the random variable 



oo 



(18) 




OO OO 



^E[||Z,e,||2.]^^||e,;| 



OO 



{Z + m,(f)) (m,0) + ^Zi{e,,(j)), 



where m Cz G, has a characteristic function 
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for all (p G G' . In our case, the random variable Z has mean m = and covariance 
{C4>,(j)) = X]t:i(6i) 0)^- The covariance {C(j}^4)) is the squared norm of in the 
strong dual space H' of H. Indeed, the linear form {j-,(p)G.G' is continuous on H 
so it belongs to H' and its norm is {C(l),(j)). For short, we denote </> € H' . The 
covariance (C0, (j)) for any </> € G" is finite, since H ^ G implies that G' ^ if' 
continuously. The dual space G' is actually dense in H' as a consequence of the 
Hahn-Banach theorem. Indeed, if h'^ € H'\j'{G') ^ then there would exist 
he H" = H such that (/i, h'^) = 1 and {h, h') = for every h' e /(G"). But j'(G') 
separates the points in H because of the injectivity of j. Therefore, h = and 
hence j'{G') is dense in H'. 

The mapping G' 3 (j) >-> {C(f),(j)) has an extension H' 3 g ^ {Gg,g) '■— \\g\\Hi- 
By the polarization equality, C is the isometric isomorphism between H' and H 
defined by the Riesz representation theorem. We continue to denote C with C. 

Remark 8. It is well-known that the sample space G of Z can be replaced with 
any bigger locally convex Souslin vector space Go into which G can be continuously 
and injectively embedded. For example. Go may be the distribution space !?'([/), 
where U C R" is open, equipped with the usual weak topology. 

Measures having characteristic functions of the above form (19) are called Gauss- 
ian measures. Especially, the image measure fiz = P o is Gaussian. Random 
variables, whose image measures are Gaussian, arc called Gaussian random vari- 
ables. The space H is the so-called Cameron- Martin space of ^z- 

By Theorems 3.2.3, 3.2.7 and 3.5.1 in [12] any zero-mean Gaussian random vari- 
able on a locally convex Souslin space is equivalent with a random variable of the 
form (18). More details on Gaussian measures can be found in [12, 55, 91]. 

5.2.2. Inverse problems with Gaussian noise. We consider the statistical inverse 
problem of estimating the distribution of X given a sample oiY = L{X) +e, where 
£ is a zero mean Gaussian random variable that has values in a separable Hilbert 
space G. 

We denote with iJ^^ the Cameron-Martin space of /i^ and with C^ : H'^^ — > H^^ 
the covariance operator of e. The unknown random variable X has values in some 
locally convex Souslin topological vector space F. The random variables e and 
X are taken to be independent. The direct theory L : F — >■ G is a continuous 
mapping that satisfies the folloging additional restrictive conditions: L : F ^ H^^ 
is continuous, the range of the combined mapping C^^L belongs to G' where G' is 
the strong dual of G, and the mapping C^^L : F ^ G' is continuous. 

As an approximated model, we take a sequence of .F-valued random variables 
Xn that satisfy the same conditions as X. We denote Yn := L{Xn) + e. 

Recalling Remark 6, we require that 



(20) E 



ga||i(^) 



A sup E 



ga||L(X„)||c, 



< oo 



for all a > 0. The condition holds especially when the range of G^ L is bounded 



in G'. 



According to the famous Cameron-Martin formula (see Corollary 2.4.3 and The- 
orem 3.2.3 in [12]), the Gaussian measures /i^ and pl^-\.l(x) S'l'e equivalent when 
L{x) S H^^ . The corresponding Radon-Nikodym density is 



zeG. 
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Remark 9. In the Cameron-Martin formula, the notation (•, •) is, in general, a 
measurable extension of the duality. Namely, the vector C^^L{x) need not belong 
to the space G' but in the larger space H'^^. But G' is dense in H'^^. Following 
Lemma 2.2.8. in [12], we may define {z,C~^L{x)) as the limit of {z,(l)n) in L^ilJ'e) 
where 0„ S G' converge to C~^L{x) in H'^^ as n — >• oo. Especially, {z,C~^L{x)) is 
a Gaussian random variable on {G^Q , ^e). Different approximating sequences lead 
to equivalent random variables, since the limits coincide in L^(^g). 

When the range C^^L C G", we have {z,C~^ L{x)) = {z,C~^e)G,G' , and, con- 
sequently, the Radon-Nikodym density is separately continuous with respect to z 
on G and with respect to x on F. By Theorem 3.4, p is fix x /Xg-measurable. In 
Theorem 3.3, we may choose v = fi^ and take 

J^e^p{{y,C-^L{x)) - '^\\L{x)\\%J d^lxix) 



J exp{{y,Cr'L{x)) - ^\\L{x)\\lJ dfixix) 



as an essentially unique solution for all y ^ G. Note, that our assumptions guarantee 
that 

< exp (^{y,C^^L{x)) - < cxp {MgWC-' L{x)\\g') G L\fix) 

so that the set Mq in (14) is empty. Similarly, when Xn satisfies the same conditions 
as X, we obtain 

, . _ JuC^p{{y,C~^Lix)) ~ ^WLjxWH^) dfixAx) 

^"^ Je^p{{y,Cr'Hx))-^\\L{x)\\j,^)dfixAx) 

for all y eG. 

We consider next the partial uniqueness of the solutions fi and on J- <Si G. 
Denote with S*^^ the support of He on G, which coincides with the closure of the 
Cameron-Martin space iJ^^ in G by Theorem 3.6.1 in [12]. The measure Me-i-L(x)is 
equivalent with fi^ by the Cameron-Martin formula. Hence, the measures fiy and 
IJ,Y„ have the same topological support as the measure p.^ by Lemma 3.5. We 
conclude that 5^^. = S^^.^^ = iJ^^. Since sup^.^^ p{x, z) < exp(Gl|G7^L(a;)l|G')i the 
solutions p and ju„ are J^-continuous on G H by Theorem 3.4. Hence, p, and 
fin arc the only J^-continuous solutions on G H H^^ by Corollary 1 . In the light of 
Corollary 1 and the discussion preceding it, the partial uniqueness is not so simple 
in the situation described in Remark 8. 

In order to apply Theorem 4.4, we use the continuity of x i— >■ p{x,y) and the 
uniform integrability that follows from the assumption (20). Consequently, Theorem 
4.4 holds. If, for example, the range of G^L is a bounded set in G', also Theorem 
4.7 is available. 

Remark 10. In general, the measure py = l^L(x)+e does not satisfy Pe+L{x) << 
Py for /ix-almost every x. Indeed, take X and e to be independent Gaussian 
random variables with the same Cameron-Martin space L'^(I), where / is the unit 
interval (0, 1). Let L be the identity, li py{U) = 0, then pe+x{U) = for ^x-almost 
every x by the formula 

(21) py{U) = E[Ely(f/)lX]] = j p,+.,{U)dpx{x). 

Suppose that Pe+x <C py for /ijf-a.e. x, say for all a; € M such that px{M) = 1. 
The random variable 1" = X + e is also Gaussian, and any two Gaussian measures 
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on the same locally convex space are either equivalent or singular. But then ^^^xi 
is equivalent to and fie+x2 is equivalent to /iy for any xi,X2 G M so also the two 
measures He+xi and /ie+xa are equivalent. But P{X e L'^il)) = so equivalence 
should hold also for some xi,X2 ^ L'^{I): which is impossible by the Cameron- 
Martin theorem. The /xjf-zero measurable set in (21) necessarily depends on U in 
this case. 

5.3. Gaussian dominated noise. We consider a simple modification of Gaussian 
noise. Suppose that the assumptions in Section 5.2.2 hold except that instead of 
Y = L{X) + e we are observing Y = L{X) + e, where is dominated by the 
Gaussian measure /ig i.e. 

for some / e The translation of /.i^ by L{x) has the form 

fJ.^+L{x){V) = I lv{y + L{x))d^ie{y) 

lv{y + L{x))f{y)dfiM 
f{y - L{x))dn,+L{x){y) 

f{y-L{x))cxp (^{y,C-'L{x)) - dfiM- 

The integrand is a fix x /^e-measurable functions as a product of two fix x /im- 
measurable functions. By Theorem 3.3, the posterior distribution of X given a 





■f e can be taken to be 






lufiy- 


-L{x))cxp[{y,C-^L{x))~ 




j dfix{x) 


Ifiy- 


Lix))exp(^{y,Cr'L{x))- 




dfix{x) 



(22) fi{U,y) = 

whenever the denominator is positive. 

For instance, let e to be a restriction of e to some open set K G Q that has 
positive /i^-measure. This means that the noise £ = s\k has the distribution 

for all V IE G i.e. we consider conditional probabilities 

fi,{V) ^ fie{V\K). 

Note that as a Borel set, K is of the form K ^ {y E G : {{y, (f>i), {y, ^2), ■ ■ ■) E E}, 
where 0i E G' separate the points in G and E E S(R°°). The Radon-Nikodym 
density of fi^ with respect to fig, is by (23) 

.f{y) = ^{y) = ^,iK{y). 

By Theorem 3.3, an essentially unique posterior distribution of X given a sample 
2/0 of y = L{X) + e can be represented as 



/^lx(yo - i(x))exp ((yo,C-iL(x)) - \\\L{x)\\ ) dfix{x) 
f^iU,yo) = T ; — X 

JlKiyo - i(x))exp [{yo,Cr'Lix)) - ^\\Lix)\\%^J dfix{x) 
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whenever the denominator is positive. We see that when we can exclude noise 
patterns, the posterior distribution will concentrate more on the true value xo (when 
Lis injective). When ^x{L~^{d{{yo} — A'))) = 0, the mapping x H> — L{x)) 
is continuos on a set of full /ix-measure and the convergence results are hence 
available. 

For example, take G = F = H^^{a, b), where —oo < a < b < oo, and set Lx = 
Ci{x, ei)ei for all x G L^(a, b), where {ei}°^^ is an orthonormal basis of L^{a, b) 
and the constants q > satisfy {(1 + i)ci}^i ^ Then L : L'^{a, b) H^{a, b) 
is continuous. Set X = J^i^i ^i^i ^"^^ ^ — Y^TLi ^i^i^ where Ei and Xi, i g N, are 
independent standard normal random variables. Set 

A: 

K = {yeG:\Y^{y,e,)\<C}. 

i=l 

Then ^i,{K) > and 

fc 

^ix{L-\^{{yo} - K))) = ^lx{L-\{yeG■.\J2{y^^^) + {yo,e^)\^C}) 

k 

= fix{{y e L^{a,b) : |^c,((2/,e,) + {yo,e,))\ = G}) 

fc 

= 1[P{\Z\^G) = 0, 

4=1 

where Z = X]i=i + {yo, ^i)) is a Gaussian random variable. 

The partial uniqueness with respect to the topology of G remains an open ques- 
tion. 

Another example arises from the Girsanov formula. We equip G — C([0,T]), 
where T > 0, with the usual supremum norm. The space G is then complete 
separable Banach space and its dual space G" is the space of Radon measures on 
[0, T]. We assume that the observation is of the form Yt = L{X)t + et for < t < T, 
where i^- valued X and C([0, T])-valued eare statistically independent and L : F ^ 
C([0,T]) is a continuos mapping. More precisely, we assume a stronger condition 
that L : F — > Cg (0, T) is continuous. 

Suppose that the noise e G G is of the form 

= / a{s]£s)ds 

JO 

where St is an ordinary Brownian motion on [0,T] and a : [0, T] x R — > R is 
continuous. Note that et indeed is a C([0, r])-valued random variable since the 
continuous functionals {St : t G Q H [0,T]} separate the points in G and, therefore, 
also generate the cr-algebra of G. 

It is well-known that the Cameron-Martin space of the Brownian motion on 
[0,r] is the separable Hilbert space {/ G if^(0,T) : /(O) = 0} equipped with the 
norm the covariance operator G^ has kernel min(<, s) and G^^ ~ ^ on 

{/ G H^{0,T) : /(O) = 0,/'(r) = 0} (see [12]). By the Cameron-Martin theorem 



dL{x)s 




ds 
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The Girsanov formula 

^(2/)=:cxp^y a{s,ys)dys J \a{s,ys)\'^ds^ , 

where the first integral is a sample of the corresponding stochastic integral, holds 
when the Novikov's condition 

exp / \a{s; es)\'^ds | < oo 



(24) E 



is satisfied (see [HI])- For example, if |a(s,x)| < C(l + \x\) for some C > 0, then 
(24) holds since 

by the Ferniquc theorem (see Corollary 2.8.6 in [12]). For instance, take a{s,x) = 
Y^l^. By the Ito formula, we see that the mapping 

£2 

extends to a continuous functional on C([0, T]). Thus y i— > ^^(y) has a continuous 
version 

ds] 



£ ^ I a{es)des = ln(l + e^) - / 7;— I^'^* 



2ys 



on C([0,r]). As in (22), we obtain an explicit solution 



/^(t/,2/) 



of the statistical inverse problem of estimating the distribution of X given the 
observation Yt — L{X)t + £4 + a{s,£s)ds on [0,T]. The posterior convergence 
results arc available for approximated prior distribution. 

In general, any G- valued random variable e whose image measure is absolutely 
continuous with respect to a zero mean Gaussian measure //^ satisfies 

in distribution for some mapping T : G ^ H^^ (sec Corollary 4.2 in [14]). 

5.4. Spherically invariant noise. Let F and G be locally convex Souslin topo- 
logical vector spaces. We say that e is a spherically invariant G-valued random 
variable if £ = 7Z, where Z is a zero-mean Gaussian G-valued random variable 
whose Cameron-Martin space is infinite-dimensional, and Z is statistically indepen- 
dent from a non-negative real- valued random variable 7 whose distribution has no 
atom at zero. 

The expression "spherically invariant random process (SIRP)" is used in the 
engineering literature [158] while the more descriptive but little used expression 
-spherically symmetric measure" appears in the mathematical literature (see 
Definition 7.4.1 in [12]). The latter has emphasis on the fact that the measure is 
only invariant with respect to orthogonal operators on H^^ (see Theorem 7.4.2 in 
[12]). 

In order to study the posterior measure of X given Y ~ L{X) + 7Z, we apply 
an averaging principle together with the following lemma. 
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Lemma 5.1. Let F and G be locally convex Souslin topological vector spaces. Let 
Z be a zero-mean Gaussian G-valued random variable whose Cameron-Martin space 
is infinite- dimensional. Let X he an F -valued random variable, and let 'y be a non- 
negative random variable whose distribution has no atom at zero. Suppose that 7, 
X and Z are statistically independent. 

Let L : F ^ G be a continuous mapping such that L{F) C H^^ , where H^^ is 
the Cameron-Martin space of fiz- Let {e^j^j^ be an orthonormal basis of H^^ such 
that C^^Ci G G' , where Gz is the covariance operator of Z. Set Y = L{X) + ^Z. 

For any f e the conditional expectation 

E[/(y,7)|a(r)]H = /(yM,7yM) 

for P -almost every w G fi, where y 7y is a Q -measurable function on G that 
satisfies 

(25) 7y= ( lim -j2{y:Cz'e^r] 

whenever a finite limit exists and = otherwise. 

Proof. The mapping y i— > 7j, is indeed measurable since the set 

n 

N = {yeG: hm -J^iv^Cz^e^f ^}. 

i=l 

is a Borel set (see Lemma 2.1.7 in [13]). We show in a moment that 7 = 7y P-almost 
surely. Then the conditional expectations of f{Y, 7) and f{Y, jy) coincide since the 
two random variables coincide almost surely. In order to conclude the claim, we note 
that jY{ui) is y~^(^)-measurable as a combination of two measurable functions. 

The random variables 7, X and Z are statistically independent, which implies 
that their image measure n^-y^x.z) is a product measure on the product space R-|- x 
FxG. 

Since Z has a Gaussian distribution, the random variables (Z, C^^e^) are sta- 
tistically independent standard normal random variables. The same holds for the 
random variables {t, x, z) {z, C^^Ci) on the measure space (R_|_ x F xG, S(R+ x 
F X G),iiry (S> fix ® IJ'z)- The random variable 

{t,x,z) i~> L{x) +tz 

has the following property. The law of large numbers implies that 

lim -y"{Lix)+tz,GZ^e,f = lim - V (z, C^^e,)^ = 

i=l 1=1 

for any t £ R+, x £ F, and ^^-a.e. z £ G. Since the image measure has the 
product structure, this also holds for -almost every {t,x,z). Hence, 

1 " 

(7,X,Z)-i{(i,.T,z) : lim -S^iLix) + tz,Cz'e,f ^ t^} 

1=1 

has full F-measure i.e. 



1 

lim -y2{L{X)+^Z,Gz'e 
P-almost surely. □ 
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The averaging principle for the posterior distributions is given in the following 
lemma. Note, that also topological products of Souslin spaces are Souslin spaces. 

Lemma 5.2. Let the assumptions of the Lemma 5.1 hold. 

A solution n{-,y) of the .statistical inverse problem of estimating the distribution 
of X given a sample yofY^ L{X) + 7Z coincides pLy-almost surely with a Borel 
measurable solution {y,^y)) of the statistical inverse problem of estimating the 
distribution of X given (K, 7) = {y,jy), where is defined by (25). 

Proof. The cr-algebra cr{Y) generated hy Y = L{X) + 7Z is a sub-a-algebra of 
the cr-algebra a{{Y,^) generated by the G x R+-valued random variable (^,7) = 
(7Z + L(X),7). By Lemma 5.1 and a property of conditional expectations, the 
solutions satisfy 

^Ji{U,Y) = E[ly(X)|a(y)]=E[E[la(X)|a(r,7)]|a(F)] 

= = e[m(c/, (r,7))|a(y)] - {Y,^y)) 

almost surely for a fixed U ^ F. It is easy to see that y n- fi{U, {y,jy)) is Borel- 
measurablc. By the Souslin property, it is enough to consider only countably many 
U € J- in order to identify the two measures. Hence, iU,y) 1— >■ fi{U., (y,7j,)) is a 
solution of the statistical inverse problem of estimating the distribution of X given 
a sample y oiY = L{X) + 7Z. □ 

Theorem 5.3. Let the assumptions of the Lemma 5.1 hold. The essentially unique 
posterior distribution of X given a sample y of Y = L{X) + 7Z has a version 



(26) KA,y) = 





y,j-'C^'L{x))- 


1 


\\L{x) 




] d^ix{x) 


/fCxp(( 


[y,^y'Cz'Lix))- 


1 

2^ 


\\Lix) 




) d^xix) 



for all y ^ G such that the limit 




exists and does not vanish. 

Proof. Let us calculate the posterior distribution of X given (1^,7). 

The conditional distribution of (5^, 7) given a sample x of X is ^i(-yz+L{x),'f){G x 
B), where C £ G and B £ S(R+) by Lemma 3.2. Furthermore, the conditional 
distribution of L{x) + "fZ given cr((7,X)) is fJ.-f(i^g)z+L{x)- Taking conditional ex- 
pectations inside the integral gives 

fii^z+L(x),-f)iC ^ B) = P{-rZ + L{x)eGr\jeB) 

= n^chZ + L{x))lBh)] 

= E[E[lc(7^ + i(a:))k(7)]lB(7)] 

= J t^aZ+L{x){C)lB{a)dn^{a). 
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We may now use the absolute continuity of the translated measures HaZ+L{x) with 
respect to Haz which follows from the Cameron-Martin theorem. We obtain 



c 

CxB 

= E[E[lcxB(7^,7)e^^"'''"^^^"^"^^^^-^""^^^"^-- ^(7)]] 

CxB 



Hence, the Radon-Nikodym derivative of fJ-(jz+L{x).-y) with respect to fJ-(^z.-y) is 



The posterior distribution of X given a sample (y, a) of (Y, 7) has a version 

J^exp ({y,a-^Cz'Lix)) - ^\\L{x)\\j, ) d^,xix) 

(y, a)) = ) 

J^e^p[{y,a-^Cz'Hx)) - J d^^x{x) 

for a\\ y € G and a 7^ 0. We obtain the required result by Lemma 5.2. □ 

Posterior convergence holds under the same conditions as in the Gaussian case. 

Remark 11. The posterior distribution (26) does not depend on the distribution 
of 7. Especially, 7 does not necessarily have finite moments. 

Remark 12. If the sample y G H^^i then the estimated random number 7^ = 0. 
Consequently, we can not apply Theorem 2.7 for the solution (26) on any measurable 
linear subspace of G of full /z^-measure, since it contains the Cameron-Martin 
space H^^ . Besides the Lusin theorem, nothing seems to be known about the 
continuity of the measurable function y 1— > 7-y. Even though the continuity of the 
posterior distribution as a function of observations remains an open question, we 
can anticipate from the form of the posterior distribution that the prior distribution 
will have a good regularizing effect on the corresponding ill-posed inverse problem. 

Following [125], we call e = 7Z a symmetric a-stable sub-Gaussian G-valued 
random variable if 7 = \/r, where the non-negative random variable T satisfies 

E[e"*^] = e"*°'", t > 0, 

for some < a < 2, and Z is a zero mean G-valued Gaussian random variable. 

For instance, a-stable random variables are used as approximative models for 
ambient noise. An example of ambient noise is the acoustic noise in oceans origi- 
nating from e.g. shipping, rain fall, waves, animal activity, bubbles, cracking of ice 
and geological processes [66, 146]. It disturbs acoustic communication and active 
acoustic remote sensing in underwater environments [20, 92]. The finite-dimensional 
distributions of ambient noise are thought to originate from many disturbances oc- 
curring in natural environments: typically few strong and a large number of weak 
disturbances of different orders. The variances of individual disturbances are often 
such that Lindeberg's condition, which is a sufficient condition (and in some cases 
also necessary) for the applicability of the classical central limit theorem, does not 
hold [152]. A generalized central limit theorem states that a.s. converging sums 
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of independent random variables necessarily have stable distributions (see Defini- 
tion 1.1.5 in [125]). Non-Gaussian stable distributions exhibit heavy tails, which 
explains why the Gaussian distributions are not the best ones for modeling ambient 
noise. Symmetric a-stable sub-Gaussian random variables are perhaps the most 
simple subclass of stable distributions. 

Sub-Gaussian noise is encountered also in fMRI (functional magnetic resonance 
imaging), where it models physiological noise, e.g. disturbances originating from 
breathing and heartbeat [17]. 

Spherically symmetric noise models arc used also as approximative models in high 
resolution radar imaging for describing the ground-clutter (i.e. unwanted echoes of 
the transmitted radar signal from the ground), and also sea-clutter (i.e. echoes 
from the surface of the sea) [22, 23, 24]. It should be noted that the modeling of 
radar clutter and underwater noise is not yet a mature field of science. Beside of 
spherically symmetric models also other models have been developed and better 
models are pursued after. 

Noise is usually rougher than the signal by rule of thumb. In the above applica- 
tions, it is not verified whether this holds for the noise e and signals L(x), where 
X ^ F. For radar imaging this is not a critical point since the reflected signal 
acquires some regularity from the transmitted signal. 

5.5. Subordinated noise. Wc consider another generalization of Gaussian noise 
that is similar to spherically symmetric noise. 

Let Bt be a Brownian motion on R_|- satisfying Bq = almost surely. Subordi- 
nated noise is here defined as a time-changed process 

£t = Ben I 

where at is a strictly increasing stochastic process that is statistically independent 
from the Brownian motion Bt. We assume that at has bi-Lipschitz-continuous 
sample paths and satisfies ao = 0. For example, at can be an integral function 
of some statistically independent Gamma process starting from a non-zero value. 
Such a distribution of a can refiect inaccuracies that are believed to be present in 
the covariance operator of the noise e. 

Lemma 5.4. The random function e. on [0, 1] is a C([0, l\)-valued random variable. 

Proof. The sample paths of e are continuous functions as compositions of continuous 
functions. Moreover, the space C([0, 1]) is a separable Frechet space, which implies 
that its Borel cr-algebra is generated by the cylinder sets 

^ = {/eC([0,l]):/(t,)eC/.Vze/} 

where Ui G S(R), / C N are finite sets, and U^j^t, is a dense subset of [0, 1]. (see 
Theorem A. 3. 7 in [12]). It is enough to check that the mapping 

UJ l-> Ba^ 

is a random variable for any ti e [0, 1]. But this follows from the joint measurability 
of the Brownian motion from [0, 1] x £7 into R. □ 

We take G = C([0, 1]), = S(C([0, 1])), and denote ^i,{A) = P{e. G A) for any 
Borel set A c C([0,1]). 

Lemma 5.5. The Gaussian measure IJ'Ba(^)+L{x) i^o.^ ^o.^ mean L(x) and the co- 
variance operator with kernel Ga{u:){t, s) = m\n{at{uj),as{uj)) on [0,1] x [0,1] is a 
version of the conditional probability V i— > E[ly(_BQ, + L{x))\a{a)]{uj) on Q. 
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Proof. By defining Bt = lt>oBt on R, tlie Brownian motion extends to a C(R)- 
valued random variable, where C(R) is equipped with the Borel aalgebra with 
respect to the locally convex topology given by the family of seminorms 

Mf) = sup 1/(01, 

where Ki = [—i, i] and z G N (i.e. the topology of uniform convergence on compact 
sets). The space C(R) is then a locally convex Souslin space, since its topology is 
metrizable by a complete metric 



l+P«(/l-/2) 

and the polynomials with rational coefficients form a dense set by the Stone- 
Weierstrass theorem. Moreover, a is a C([0, l])-valued random variable. 

Recalling Lemma 3.2, we need to check that the composition mapping (/,<?) ^-^ 
f o g + L{x) is Borel measurable from C(R) x C([0, 1]) into C([0, 1]). Since point 
evaluations generate the Borel cr-algebra of C([0, 1]), it is enough to show that 
functionals (/, .g) n- / o g{t) + L{x)t are Borel measurable for a fixed t E [0, 1]. We 
show that this function is actually continuous. Since the both spaces are metric 
spaces, it is enough to check the sequential continuity on the product space C(R) x 
C([0, 1]), which is metrizable. 

Let limi^oo(/i,5i) = if, 9) in C'(R-) x "^([0, 1]), which implies that lim,_^oo fi = f 
and limi_j.oo.gi = 3 in corresponding spaces. Then K = {gi{t) € R : i € N} is 
compact for the fixed t E [0, 1] and 

< sup \Mt) - fit)\ + \fig.{t)) - figm ^ 

as i — >■ cxj by the convergence of ifi,gi) and the continuity of /. □ 

Theorem 5.6. Let F be a locally convex Souslin topological vector space equipped 
with its Borel a-algebra J- and let L : F ^ H^{[0, 1]) be a continuous mapping that 
satisfies L(x)\t=Q = for all x € F. Let Bt be a Brownian motion on [0, 1] starting 
from zero. Let at be a strictly increasing stochastic process that is statistically inde- 
pendent from the Brownian motion Bt and that has bi-Lipschitz continuous sample 
paths satisfying a(0) = almost surely. Let X be an F-valued random variable that 
is statistically independent from the Brownian motion Bt and the stochastic process 
at- 

The essentiaaly unique solution of estimating the distribution of X given a sample 
path y : [0, 1] — >■ R of Yt ~ L{X)t + Ba^ has a version fi such that 

/^exp f{y,CslLix)) - illi(.T)||l,^^ ) d^,xix) 



/exp (^{f,CslLix)) ^\\L{x)\\%^^^^J d^^xix) 



for any U E T and for any y G C([0, 1]) such that its quadratic variation [y] satisfies 
< [y]f < oo for all t € (0, 1] . 

Proof. Let g be some sample of a on [0, 1]. The mapping 

Tg- f ^ fog 
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is linear and measurable from C(R+) to C([0,1]). Hence, the Cameron-Martin 
space of TgB = Bg coincides with T{Hl{Tl+)) as a vector space (see Theorems 
3.7.3 and 3.7.6 in [12]; choose X = C(R+) x C([0,1]) in order to generalize the 
claim to the present situation). Since g is bi-Lipschitz continuous, the mapping 
/ o (7 is in H^{0, 1) whenever / S H^{g{0, 1)) (e.g. Theorem 2.2.2 in [165]), and the 
mapping is actually onto the subspace H ^ {f G H^{0, 1) : /(O) ~ 0}. Especially, 
the vector L(x) G H^^ by the assumption, so by Lemma 5.5 



as in Theorem 5.3. It is well-known that for continuous time-changes at the qua- 
dratic variation of Ba^ coincides with at (see Chapter 5: Proposition 1.5 in [121]). 
Therefore, at is a measurable function of the sample pathsof Ba^ (the quadratic 
variation is obtained by taking a limit in probability and we need to pick up a 
subsequence in order to get the a.s. convergence). Since L{X) e H^{0.1), it has 
finite variation, which implies that its quadratic variation vanishes. Also //y-almost 
every sample path of L(X)t + Ba^ has at as its quadratic variation. We obtain the 
claim similarly as in Lemma 5.2. □ 

Posterior convergence holds similarly as in the Gaussian case. The assumptions 
that guarantee the continuity of the solution are not known. 

5.6. Decomposable additive noise. Let F and G be locally convex Souslin topo- 
logical vector spaces. We say that G-valucd random noise s is decomposable if it is 
of the form 



where Si are independent random variables with a.e. positive probability density 
functions pi with respect to the Lebesgue measure and fi G G are some non-zero 
vectors. 

Remark 13. If Si are random variables and e := X^i^i ^ifi ^-S- for some vectors 
fi G G, then e is a C?- valued random variable. Indeed, since G is a Souslin topological 
vector space, the mapping R x G 9 (a, /)>->■ a/ =: T(a, /) is continuous, therefore 
also B{RxG) = B{R) (8) G measurable. The composition of the measurable mapping 
(w,/) !->■ {ei{uj),f) with T gives a G-valued random variable T{ei,f) = Sif. Also 
the sum of two G-valued random variables is a G-valued random variable and limits 
of locally convex Souslin space- valued random variables are random variables (since 
the cylinder sets generate the Borel cr-algebra by Theorem 6.8.9 in [13]). 

If all possible signals i(x), x G F are sparse in the sense that they belong to 
the linear span of {fi : i € N} and the noise e is decomposable, then the measures 
t^e+L{x) ^re absolutely continuous with respect to fig [126]. 

Moreover, if {fi}°Zi is a basis of the closed subspace span({/i : i G N}), the 
proof in [126] gives, with minor additional work, an explicit formula for the Radon- 
Nikodym density. For simplicity, we take G — span({/i : i G N}). 

Theorem 5.7. Let G be a locally convex Souslin topological vector space equipped 
with the Borel a-algebra Q and a basis {fi}'^i such that the unique coefficients jji 
in y = Vifi depend measurably on y € G. Let a G-valued random variable 

e be of the form e = X^i^i ^i/i; where the random variables £i are statistically 



a+L(x),a) 




oo 




52 



SARI LASANEN 



independent and have probability density functions pi that are a.e. positive. If 
Ln{x) = Ylt=i'^iix)fk^, then 



(27) 



for almost every y = J2iLi Vif^ 



Proof. Let A Cz Q. By possibly rearranging finitely many vectors, we may suppose 
that Ln{x) = J27=i ^ifi- We consider the probability 



Ms+L„(x)(^) = P{e + Ln{x) e .4) = E 



U ^ £ifi + X! 



Denote Z = Yl^n+i ^ifi- Following [126], we calculate the conditional expectation 
of 1a (e + L{X)) given + o,i)fi and, by Lemma 3.2, obtain with straightfor- 

ward calculations 



E 



1a ^£ifi + ^aifi 



\i=l 



4=1 



At this point, the proof differs from [126]. Namely, we multiply and divide with the 
positive densities of e^, and obtain 



E 



1a ^^ifi + ^aifi 



\i=l 



i=l 



^ n 



PiiVi - cii) 



Pi{Vi)dyi ---dyr, 



E 



E 



n 



Pi{£i - aj) 
\ Pii^i) 



,1 

Since the unique coefficients depend measurably on e, we may write 



□ 
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The above theorem verifies the intuitive picture that for sparse signals we may 
as weh study the posterior of X given the finite-dimensional data 

n 

1=1 

The following theorem gives a significant enlargement of applicable noise models 
in statistical inverse problems. 

Corollary 2 (Generalized Cameron-Martin formula). Let G he a locally convex 
Souslin topological vector space equipped with the Borel cr-algehra Q and a basis 
{/ili^i such that the unique coefficients yi in y ~ J^iZi Vifi depend measurably 
on y G G. Let a G-valued random variable e be of the form e = Yl^i^ifi, where 
the random variables Si are statistically independent and have probability density 
functions pi that are a.e. positive. If L{x) = Yl'^iai{x)fi for all x £ F, and 
densities 

are uniformly integrahle with respect to fj.^ and convergent fii^-almost everywhere, 
then 

for n^-almost every y = J^'i^i Vifi- 

Proof. Sec Proposition 9.9.10 in [13], which says that if lim„ T„ = T, where T„ and 
T arc measurable mappings on a completely regular space, and the distributions 
of all T„ have uniformly integrablc Radon-Nikodym densities /?„ with respect to 
the same Radon probability measure then the distribution of T has Radon- 
Nikodym density p with respect to the same Radon probability measure as well, 
and p is the limit of pn in the weak topology of L^{v). This result is especially 
applicable to the random variables T„(a;, z) = Ln{x) + z and T(x, z) = L{x) -\- z on 
{¥ G,T ® Q.,p.x ® p-e) and the measure v = p^. The integrals of the densities 
over any Borel set converge. By Theorem 4.5.6 and Corollary 4.5.7 in [13] the weak 
limit coincides with the almost sure limit. □ 

In Corollary 2, the Radon-Nikodym density -^-^^^{y) has a form similar to 
Radon-Nikodym densities appearing in the Kakutani dichotomy theorem, which 
addresses the equivalence and singularity of infinite product measures on R°° [77] . 
Also Umcmura [145] has given conditions for the absolute continuity of measures 
on abstract spaces when the corresponding finite-dimensional distributions are ab- 

solutely continuous. In our case, Umcmura's conditions ask \^ j=^^' — ■ — —\y) 

to be a Cauchy sequence in L'^{p^). We feel that the uniform integrability of the 
Radon-Nikodym densities is easier to validate than Umemura's conditions. 

Corollary 3. Let the assumptions of Corollary 2 hold. The essentially unique 
posterior distribution of X given a sample y of Y = L[X) + e has a version p{-,y) 
such that 



whenever Q < J YiiLi ^^^^^^p{^^f^dpx{x) < oo. 
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Remark 14. Signals in non-Gaussian noise may appear in model approximations. 
Let the true model be 

Y = L{X)+e 

where e = J2ili ^ifi ^^'^ ^'^s statistically independent. When the model L 

is numerically very complicated, the common practice is to replace L with some sim- 
pler approximation L„. For example, Ln may have the form L„(X) — X]r=i 
Though the true model L and the approximated model L„ are known, the model 
error e = L{X) — L„(X) is sometimes replaced with a G- valued random variable e' 
that has the same distribution as e but is statistically independent from X [140]. 
We note that the observation model is then 

Y = L„{X)+e' + e, 

where e represents the uncertainties in the forward model L„. Beside of physical 
noise, the distribution of the noise may represent our prior beliefs about the uncer- 
tainties in the forward model, which do not necessarily have Gaussian distributions. 

5.7. Periodic signals in decomposable Laplace noise. In this section, we 
study an example case of the generalized Cameron-Martin formula for a non- 
Gaussian noise distribution. A similar distribution has been constructed before 
by Shimomura [131] who gave conditions under which certain translates of the dis- 
tribution were equivalent to the original distribution. However, we use the methods 
of Section 5.6. 

One class of inverse problems that involves periodic signals arc the inverse scat- 
tering problems - the far-field pattern of the scattered wave in the 2D fixed energy 
inverse acoustic or potential scattering problem is a function on the torus. The 
measured far-field pattern is possibly contaminated by instrumental noise, far-fields 
of other unknown incoming fields, contributions from other scattercrs, and the near- 
field and plane wave approximation errors. Although the random model below is 
oversimplified to fully cover this case, it shows how periodicity can be utilized in 
Bayesian inverse problems. 

Suppose that L(x) S C"(S'^-' for all a; G F and some a > 1. Then the Fourier 
coefficients 

L{x). = — / L{x;t)e-'^^dt 
27r Jo 

are £i-summablc and the corresponding Fourier series converges to the limit 

oo 

L{x-t)^ ^ ^^),.e^'* 

k— — oo 

in C{S^) (equipped with the usual supremum norm). 

Let £fe be mutually statistically independent random variables whose probability 
density functions with respect to the Lebesgue measure are 

for all fc G Z and some common 6 > i.e. they are zero mean Laplace random 
variables. The relation of the normal distribution to the Laplace distribution is 
that a conditionally normal random variable ek\<J ^ N{0,a'^) with a Rayleigh dis- 
tributed variance has a Laplace distribution. In statistical inverse problems, one 
interpretation of the Laplace distribution is that we do not know the error variance 
exactly and arc lead to describe our lack of knowledge in the form of a probability 
distribution. 
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Lemma 5.8. The random sum 

oo 

e= J2 

A:— — oo 

converges in H~^{S^) and 

(e, e~*''*)/i--i(si),//i(si) = Sk 

Proof. The space H~^{S^) is a Hilbert space, and the random variables Sk have 
zero mean so it suffices to prove that X]fc < oo (see Theorem 2 
in Chapter 3.2 in [73]). The sequence {e^''*}kL-oo forms an orthonormal basis of 
L^iS^). The imbedding of L^iS^) into H'^{S^) is Hilbcrt-Schmidt by Maurin's 
theorem, which imphes that 

k 

and therefore 

oo 

k— — oo k 

Here we used the fact that the variance of the Laplace random variable Si is 25^. □ 

Next, wc quickly check that e is a non-Gaussian random variable. The charac- 
teristic function of the random variable e is 



1 



-k) 



2 ' 



where (j)k is the Fourier coefficient ^ J^^ (j){t)e ^^^dt of € H^{S^). Note, that 
when 4)j —>■)/'€ L'^{S^) as j oo, then 

oo ^ 

i.e. their distributions converge weakly. Especially, when -0^ = ,. | — j-r-, k ^ Q and 
ipo = Q, then 

Me(iV') = — 



cosh^(6i) 

Since the weak limits of zero mean Gaussian distributions are always zero mean 
Gaussian distributions, this shows that e is indeed a non-Gaussian random variable. 

We wish to study the statistical inverse problem of estimating the probability 
distribution of X when a sample of 

Y = L{X)+e 

is known. This means that the inexact observations of the Fourier coefficients of 
L{X) are assumed to be similarly inaccurate and some components are allowed to 
have high inaccuracies. Since Laplace distribution has heavier tails than the Gauss- 
ian distribution, it protects against outliers better than the normal distribution. 
Consider first finite sums 

L„(x;t) = J2 ^feS'^'- 

\k\<n 
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By Lemma 5.8 and Theorem 5.7, the Radon-Nikodym derivative of the translated 
measure ^^^l^i^^) with respect to the measure on H~^{S^) is 



^ -b-^\yk-L(x)^)\+b-^\yk\ 

k— — n 

By the triangle inequality, we obtain that 

\\yk\~\m~m^\\<\L(x)^l 
which are summable. Therefore, the limit 



^ {\y,\-\yu~m^\) 



k= 



-oo 



exists. 

Random variables e + Ln{x) converge almost surely to e + L{x). Therefore, 
corresponding measures converge weakly i.e. for all Borel sets A whose boundary 
is ^^e+L{x)~ zero measurable it holds that 

^J■e+L{x){A) = lim /ie+L„(a;)(^) 

n— )-oo 

= lim / e''"'^^=i(l^'=l-|-^^'-^'=l)d^e(?/) 

A 

by the Lebesgue dominated convergence theorem. The exponential function is the 
Radon-Nikodym derivative (y) . 

We have shown the following theorem. 

Theorem 5.9. Let F be a locally convex Souslin topological vector space equipped 
with its Borel a-algebra. Let X be an F-valued random variable and let L : F ^ 
C"(S'^) be a continuous mapping for some a > 1. Let 

k 

be a H~^{S^) -valued random variable such that all Sk, where fc G Z, are mutually 
statistically independent random variables with probability density functions 

with respect to the Lebesgue measure for some b > 0. 

The solution of the statistical inverse problem of estimating the distribution of 
X given a sample y € H~^{S^) of Y = L(X) + e is essentially unique and has a 
version jj, such that 

fJ.{U,y) = — = 

e^"' T.T=^^i\v>^\-\v>^-L{^)^\)dfix{x) 

for all U ^ F and for all y G H^^(S^) such that the denominator is finite and 
non-zero. 
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Remark 15. The convergence of posterior distributions holds for example if for 
all n G N it holds that E[e''"'ll^^)ll*i] < C and E[e''^'ll^llfi] < C for some 
C > 0. Under the same conditions, the posterior distributions depend continuously 
on the observations. Indeed, the posterior distributions are sequentially continuous 
by the Lebesgue dominated convergence theorem and the continuity of y i— > yk , and 
sequentially continuous functions on Hilbert spaces are continuous. The topological 
support of /iy coincides with the topological support of /i^ by Lemma 3.5. The 
topological support of /i^ coincides with the closure of the linear span of {e'^^'lfc 
in H-^{S^), that is, H-^{S^) (see [126]). By Theorem 2.7, /x is the only posterior 
distribution that depends continuously on the observations. 



We present some methods for approximating the unknown. We take the unknown 
X always to be statistically independent from the noise e. Especially, when the 
Radon-Nikodym density p{x, z) = (z) is bounded and continuos, we do not 

need to ask anything special on X or its converging approximations in order to 
obtain posterior convergence. 

6.1. Random series. As discussed in Section 1.4, random series and wavelet ex- 
pansions are important devices in defining infinite-dimensional prior models. If a 
random variable X in a locally convex Souslin topological vector space F can be 
expressed as an almost surely converging series X = J^iZi ^i'Pij where 4>i & F and 
Zi are ordinary random variables, we obtain immediately finite-dimensional approx- 
imations Xn '■= X]"=i ^i'Pi by truncation. The almost sure convergence of random 
variables X„ to X implies (by the Lebesgue dominated convergence theorem) that 
the distributions fix^ converge weakly to fix ■ We return to this topic in connection 
with the linear discretizations of X in Section 6.6. 

6.2. Gaussian priors. 

6.2.1. Gaussian random series. All Gaussian i^-valucd random variables can be 
expressed with random series expansions [12]. A typical example is the Karhunen- 
Loeve expansion of a zero mean (a, 6)-valued Gaussian random variables X, where 
are chosen to be normed eigenfunctions of the covariancc operator 



on L^{a, b). Then X can be expressed as A" = X^i^i ^i^ii where the standard normal 
random variables Zi are statistically independent, and the truncated series A„ gives 
an almost surely converging approximation of X. The almost sure convergence of 
Xn to X implies the weak convergence of fix„ to fix ■ 

6.2.2. Converging covariances. A well-known sufficient condition for the weak con- 
vergence of probability measures m„ to a probability measure m on a locally convex 
Souslin topological vector space F is that 

(i) the measures to„ are uniformly tight i.e. for every e > there exist a 

compact set C F such that sup„ m„(A'^) < e. 
(a) Characteristic functionals m„{(j)) converge to m((/)) for every S F' . 



6. Examples of prior approximations 
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According to Prohorov's theorem, (i) implies that each subsequence of m„ has a 
weakly convergent subsequence (see Theorem 8.6.7 in [13]). Part (ii) identifies the 
limits of different subsequences. For some spaces, (i) can be deduced from {ii) and 
the known properties of rh (see [147]). These spaces include R" and the distribution 
spaces 5'(R'^) and V'{U), U CK'^ open. 

Let /i„ and fi be Gaussian zero mean measures on the distribution space !?'([/), 
where U C R" is open. If the covariance operators C„ of /i„ converge weakly to 
the covariance operator C of /i, i.e. 

lim (C„0» = {Cq>,^) 

n~^oo 

for every ■0 € 2?(J7), then (i) holds. Moreover, the characteristic functionals /t„ 
are then equicontinuous at zero, which is sufficient for the uniform tightness of the 
sequence ^„ by Corollary 7.13.10 in [13]. 

6.2.3. Martingale approximations. Another possibility is to use a special martingale 
approximation of the unknown X. We discretize a separable Hilbert space- valued 
X with finite-dimensional increasing orthogonal projections P„ on the Cameron- 
Martin space H of fix by setting 

n 

= '^{^^ fj)fji 

i=i 

where the vectors fj,j ~ 1, n form an orthonormal basis of the finite-dimensional 
subspace Pn{H) and U„P,i(if) is dense in H. Such martingale approximations 
where introduced to statistical inverse problems in [94]. Then 

^[{Xn,(t))\Xm] = {Xm,4>) 

when m < n and (j> E F' . Hence (X„, (j)) is a martingale. This makes \\X — Xn\\F a 
reversed submartingale. It is integrable, since 

oo 

n\\X - XJI] <J2\\{I- Pn)e,rF =\\I- Pn\\j,s, 
1=1 

where [[/ — PuWhs denotes the Hilbert-Schmidt norm oi I — Pn : H ^ F, and has 
limit in L^{P). Hence, X — Xn converge a.s. to zero (see Theorem 10.6.4 in [39]). 
The almost sure convergence of Xn to X implies the weak convergence of /xx„ to fi. 

6.3. Mappings of Gaussian variables. We consider non-linear functions of con- 
tinuous Gaussian processes as prior models. We start by deforming a Brownian 
motion Bt,t E [0, 1], with a continuous function / : R — > R by setting 

Xt = f{Bt). 

Wc check that X — Xt \s a random variable having values in a suitable function 
space F. Obviously, each Xt,t G [0,1] is a random variable. The process Xt 
also inherits sample-continuity from Brownian motion. A natural choice for F 
is the space C([0,1]) of continuous functions on compact interval [0,1], equipped 
with the supremum norm ||/||oo supjgjQj^j |/(t)[. Sample-continuous stochastic 
processes on [0, 1] arc C([0, l])-valued random variables, since the Borel a- algebra 
F oi F coincides with the smallest cr-algebra generated by Dirac's delta functions 
5t^t £ [0, 1], which are continuous linear forms on C([0, 1]) (see Proposition 12.2.2 
in [39]). 
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The process Xt can be discretizcd by replacing the Brownian motion with its 
piecewise hnear interpolation 

n 

on [0,r]. Here the functions (pi are the usual linear interpolation functions. The 
mapping f ^-^ fn measurable on C([0, 1]), and lim„_i.oo /n = / in C{[0, 1]). Espe- 
cially, the approximations 6„(<) converge a.s. to Bt in C([0,T]) (the sample path of 
Bt may first be approximated by a C^-function). Then Xn{t) = f{bn{t)) converge 
almost surely to Xt = f{Bt) as ?! — cx) due to the continuity of /. Almost sure 
convergence implies the weak convergence of the corresponding image measures. 

Examples: 1) Take f{t) = t'^,i.e. Xt = and X„(t) = bn{t)'^. The positive 
continuous functions form a measurable set in C([0,T]) that has full measure in 
this case. 2) Take f{t) ~ min(t, 1). Then we obtain the approximation Xn{t) = 
min(6„(t), 1) of the bounded function Xt = min(i?t, 1). 

The Brownian motion may be replaced with any other stochastic process whose 
sample paths are continuous. 



6.4. Stochastic integrals. We consider now prior models defined with stochastic 
integrals 

ft 



X{t) = / fis,uj)dB„ te[0,T] 
Jo 



where Bs is a Brownian motion and / : [0, T] x O ^ R is in the class V([0, T]) 
that satisfies the following conditions. A function / e V([0,r]) is B{[0,T] x J")- 
measurable, f{t,uj) is J"t-adapted and E,[J^ f{t,uj)'^dt] < oo. Here J^t is the a- 
algebra generated by all B^, s < t (see [111]). Furthermore, wc assume that / 
satisfies 



lim E 

n— ^oo 



E/(4%,c.)i(^w_^(„,,(i)-/(t,c.) 



dt 



= 0, 



where = 4"^ < 4"^ < ••• < ^'"^ = ^ ^rc such that max,(f|"^ - 4"\) ^ as 
n — >■ oo. 

With probability one, X{t) has continuous sample paths (see [111])- As in the 
previous section, wc may interpret X as C([0, T])- valued random variable. One 
discrete approximation is to take 

n 

x„(i) = 5]x„(4"))0,(t), 

where the functions cjji are the linear interpolation functions, and 

i 

X„(4"';..) = E/(c.,4"_\)(B^(„,(c.)-i?^<„, (.;)) 

- / E/(^'*5-~i)l[t<"' ,i.u{t)dBt{u) 
Jo ^ ^ 
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are approximations of the stochastic integrals X(t["'^) for i > 1 and X„(0) = 0. 
Then X„ has a subsequence that converges to X on C([0,T]). Indeed, 



sup\X{t)-X,M < \\X{t)-y^X{t\"^)Ut)\\^+ sup \X{4"^)-X^{t^"^^ 



i=l 



where, by Doob's inequality and the Ito isometry the latter term satisfies 
E[sup |X„(t("))-X(t("))p] < C sup E[|X„(i|"))-^(i!"^)P] 

l<i<n l<-i<n 



< CE 



T 







which converge to zero as ri ^ oo. Especially, the sequence {sup]^<j<„ |X„(t|"'') — 
X{t\^^)\}n converges in L^{P) and has therefore an a.s. convergent subsequence 
{\Xn.{t'l"'^) — X{t\"'^)\}j. Therefore, /xx„. converges weakly to ^x- Then any 
subsequence of measures fix„ has a weakly converging subsequence with the same 
limit /ix, and the measures /ix„ converge weakly to ^ix on C([0,r]). 

6.5. Hyperparametric models. We consider here simple hypcrparametric mod- 
els. Approximations and convergence results in a more complicated case concerning 
edge- preserving Gaussian hierarchical models were obtained in [63] . 

Let A be a Borel measure on R''. Let i/^, n G N be Borel probability measures on 
a locally convex Souslin topological vector space F for all t £ R'' and let 1 1— >■ i^n{U) 
be A-measurable for all ?7 G and n. If i/* converge weakly to the probability mea- 
sure z/* for all t then the hierarchical prior model /i„([/) = / vl^{U)d\{t) converge 
weakly. Indeed, if / is a continuous bounded function on F , then 

lim ^in{f)^ I lim ,.i{f)dX{t) ^ ^i{f). 

n— ^cxD J n— ^oo 

For example, take X ~ aZ , where Z is zero mean Gaussian with covariance C, 
and a is an ordinary random variable independent from Z (so-called scale- mixing). 
We take Z„ to be the linear discretizations Pn{Z) and Xn = aZn, where Pn{Z) Z 
a.s. as ri — > oo. We denote the distribution of a with A. Then the hierarchical prior 
distributions 

converge weakly to fix as n — >■ oo. This holds especially for sub-Gaussian pro- 
cesses. Moreover, all spherically if-symmetric nonatomic measures are mixtures 
of Gaussian measure /itz, where Z is centered Gaussian with infinite-dimensional 
Cameron-Martin space H by Theorem 7.4.2 in [12]. 

If only the distribution of the hyperparameters a G R*^ is approximated, we may 
get stronger convergence. Indeed, let Z be any i^- valued random variable and let 
a„ be a sequence of hyperparameters with probability densities A„(t) on R''. We 
set X = aZ and Xn = a„Z. If the densities A„(i) converge almost everywhere to 
the density X{t) of the hyperparameter a and the densities are uniformly bounded, 
then the hierarchical prior distributions 

f^xAU) - / fitz{U)Xn{t)dt 



2 

dt 
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converge in variation to Hx- Under the conditions of Theorem 4.7, also the corre- 
sponding posterior distributions converge in variation. 

6.6. Linear discretizations of random variables. In [97], a sequence of random 
variables Xn having values in a separable Banach space F (equipped with the Borel 
(T-algebra) is called a proper linear discretization of an .F- valued random variable X 
if Xn converge to X weakly in distribution (i.e. X„ = P„X for a sequence of finite- 
rank operators P„ on F and /i(x„.<» converge weakly to A*(x,0) for all continuous 
linear functionals G F'). The definition of a proper linear discretization is too 
weak for the present convergence results. The reason is that the weak convergence in 
distribution is equivalent to the convergence of the characteristic functionals Jlx„ {4>) 
to the characteristic functional 'p.x{4') for all cj) G F' (see Theorem 7.6 in [10]) and 
the weak convergence of the approximated prior distributions fix„ is guaranteed 
if the measures /ix„ are additionally uniformly tight (see Corollary 3.8.5 in [12]). 
In order to get convergent CM estimates, Lassas et al applied in [96] an enforced 
condition that PnX converge in norm to x for all a; € F. We follow the Gaussian case 
[94] and call Xn a measurable linear discretization of X if there exists measurable 
operators P„ having finite-dimensional ranges on F such that X„ = P„(X) and 
fix„ converge weakly to the measure fix on F. Moreover, we call Xn a continuous 
linear discretization of X if there exists finite-rank operators (i.e. bounded linear 
operators with finite-dimensional ranges) P„ on F such that Xn = PnX and fix„ 
converge weakly to the measure fix- We discuss shortly the existence of certain 
linear discretizations. 

The notion of a continuous linear discretization is related to the so-called fj,- 
approximation property. Let fi he a Radon probability measure on a separable 
Banach space F. The space F is said to have the fj,- approximation property, if 
there exists finite-rank operators P„ converging fi-a..s. to identity on F. More- 
over, the space F is said to have the stochastic approximation property if it has 
/U-approximation property for every Radon probability measure /x (see [47]). The 
stochastic ^x-approximation property gives finite-rank operators P„, which de- 
fine continuous linear discretizations of X by Xn = PnX. In [47], it was demon- 
strated that not all separable Banach spaces have stochastic approximation prop- 
erty. Hence, not all separable Banach space- valued random variables X have almost 
everywhere converging continuous linear discretizations. 

In [47] it was shown that on separable Banach spaces the stochastic approxima- 
tion property coincides with the existence of a stochastic basis of Hcrcr - a biorthog- 
onal system (ek,fk) in {F,F') such that x = ^^^ifk{x)ek for /i-almost every x 
in F [65]. Candidates of the type Xn = X]fc=i fk{X)ek arc therefore plausible for 
almost everywhere converging continuous linear discretizations of X. Moreover, if 
the coefficients fk{X) are mutually statistically independent and their distributions 
are equivalent to the Lebesgue measure, then many properties of Gaussian measures 
hold also for fix [126]. For example, measures fix+xa are then absolutely continuous 
with respect to fix for every xo in the linear span of {e^}, the linear span of {ck} 
has either /ijf -measure zero or one and the topological support of fix coincides with 
the closure of the linear span of {e^} in F. 

Continuous linear discretizations of Souslin space- valued unknowns X with prop- 
erty fix{H) = 1 for some separable Hilbert space H were considered in [116] (see 
the discussion in Section 1.3). 

In some cases, the prior distribution on a separable Frechet space F may not 
be quite what we expect. Okazaki [110] proved a remarkable result that for any 
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separable Frechet space F equipped with the Borel a-algebra, a probabihty measure 
H and a stochastic basis (cfc, /fe)fcgN, there exists a separable Banach space B such 
that B C F continuously, fJ.{B) = 1 and the stochastic basis (cfc, fk)k&'N is actually 
a Schauder basis for B. The prior distribution fi on F could be replaced with a 
prior distribution on the Banach space with the Schauder basis. The above result 
refines Kuelbs' classic result that any Radon measure on a separable Frechet space 
has Banach support (see [90], or e.g. Theorem 3.6.5 in [12]). 

6.7. Uniformly distributed sequences. Wc suggest an approximation method 
that has been historically valued as a competitor to Monte Carlo methods. The 
best known application of the approximation method is the so-called quasi-Monte 
Carlo method (see [109]). 

Let F be a HausdorfF space and /i a finite Borel measure on F. A sequence 
{xi}'^i C F is called ^-uniformly distributed if 

1 " f 

for all continuous and bounded / on F. That is, the average n^^ Sr=i point 
masses converges weakly to ^. Roughly speaking, it is the law of large numbers 
with a predetermined sequence. 

For any Borel probability measure on a locally convex Souslin space F, there 
exists some uniformly distributed sequence {xi} (see Section 8.10 (ix) in [13]). A 
low dimensional example of such a sequence is the Hammersley sequence for the 
Lebesgue measure on the unit square [60]. This means that the prior distribution 
fix on (F, T) may be approximated by measures = Sr=i ' whose prior 
information states that a realization of the random variable X„ is one of the val- 
ues Xi, i = l,...,n and there is no preference between the values Xi, i = l,...,n. 
The uniformly distributed sequence is also a possible tool for interpreting prior 
information. 

7. Conclusions 

The generalized Bayes formula is an efficient tool for obtaining posterior con- 
vergence in the weak topology of measures or in the stronger topologies of setwise 
convergence and convergence in variation (cf. Theorem 4.4 and Theorem 4.7). 
In the case, when only the hyperparamcters of a hierarchical model are approxi- 
mated, we verified that the posterior distributions converge in variation when the 
approximations arc refined (cf. Section 6.5). In Section 5 we gave examples of 
applicable non-Gaussian noise models. The explicit expressions of posterior distri- 
butions derived in Section 5 for simple non-Gaussian noise models may serve as 
model cases for further studies on the effects of non-Gaussianity of the noise dis- 
tribution. In particular, a Kakutani type generalization of the Cameron-Martin 
formula was derived. We anticipate that this generalization opens a way for a wide 
class of non-Gaussian noise models in statistical inverse problems, especially when 
used in connection with the wavelet expansions. Another example demonstrates the 
surprising fact that the posterior distribution given an infinite-dimensional observa- 
tion can have significantly simpler expression than the posterior distribution given 
a corresponding finite-dimensional observation. This suggest that in some cases the 
infinite-dimensional model could provide new numerical approximations schemes. 
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It is well-known that the generalized Bayes formula holds when the measures 
fJ-e+Lix) are /xx-almost surely dominated i.e. absolutely continuous with respect 
to some CT-finite measure for y^tx-almost every x € F. We showed that there is a 
curious interplay between the continuity of posterior distributions with respect to 
the observations and the /Zjjf-a.s. domination of )J-£+l{x) (cf- Theorem 2.8 and Re- 
mark 4). The continuity of the posterior distribution with respect to observations 
is only possible in the dominated case, which means that in the undominatcd cases 
some posterior distributions have discontinuities. The discontinuities of the poste- 
rior distributions may enhance the errors caused by replacing the required sample 
of Yn = L{Xn) + e in the posterior distribution of X„ given y„ by the actual obser- 
vation of y = L{X) + e. Moreover, the regularizing effect of the prior distributions 
on an ill-posed inverse problem could be of limited power. 

Continuity of the posterior distributions with respect to the observations has also 
other roles in statistical inverse problems. It helps to reduce the nonuniqucness of 
posterior distributions in quite general cases (cf. Theorem 2.7). 

In Section 6.6, we discussed the linear discretizations of the unknown X. We 
remarked that on arbitrary separable Banach spaces there does not always exist a 
continuous linear discretization by a result of Fonf [47]. The present convergence 
results, which are written in the same spirit as in [63], are therefore important since 
they do not require the pointwise convergence of the discretization operators. Beside 
of continuous linear discretizations, other approximation methods can therefore be 
used. One of them is a generic method for approximating any prior measure on a 
locally convex Souslin space with the help of a quasirandom sequence (see Section 
6.7). 

Finally, we list some directions for generalizing this work. We have not studied 
the speed of convergence of posterior distributions, which is a very natural question 
when choosing between different approximation schemes. The generalized Bayes 
formula gives a good framework for this study. Moreover, we considered only clas- 
sical noise models i.e. statistically independent noise and unknowns, which enabled 
us to write a simple expression for the conditional probability of the observation Y 
given the unknown X. The case of the statistically depended noise and the unknown 
is not purely theoretical, since often the unknown is approximated with a simple 
expression and the approximation error is included in the noise term. Conver- 
gence of the corresponding posterior distributions is therefore an important topic. 
Furthermore, we have not discussed what kind of approximating prior distributions 
could guarantee meaningful convergence of maximum a posteriori (MAP) estimates. 
Note, that the question is proper for classical noise models that are statistically in- 
dependent from the unknown since the posterior distribution of X given a sample 
of Y ^ L{X) + e depends then on the prior model X only through its distribution. 
The example of Lassas and Siltanen on total variation priors shows that the weak 
convergence of any approximating prior distributions is, in general, not sufficient as 
MAP estimates converged then to zero. 

There is still a wide class of statistical inverse problems, which are covered neither 
by the present work nor [94, 116], where the question of posterior convergence 
remains open. 
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